46.1 Waves in a gauge theory 494 46.2 Lorenz gauge for gravitational waves 496
46.3 Quadrupolar radiation 501 46.4 Radiated energy and power
46.5 An exact solution 505 46.6 The discovery of gravitational waves quad506\quad 506
Chapter summary 509
Exercises 509 ^(1){ }^{1} LIGO stands for Laser Interferometer Gravitational-Wave Observatory. ^(2){ }^{2} In the presence of sources, the equation is
Like as the waves make towards the pebbled shore, So do our minutes hasten to their end
William Shakespeare (1564-1616) Sonnet 60
In this chapter, we discuss the waves that can propagate as excitations of the gravitational field. These waves were predicted by Einstein in 1906 on the basis of general relativity (they had been suggested previously by Henri Poincaré) and have been the subject of two relatively recent Nobel prizes: the 1993 award to Hulse and Taylor whose work on binary pulsars offered indirect evidence for the waves, and the 2017 prize for Weiss, Thorne and Barish. The latter was awarded in the wake of the direct observation (by the LIGO^(1)\mathrm{LIGO}^{1} collaboration) of gravitational waves that resulted from the merger of two black holes. We describe the LIGO experiment at the end of the chapter. We start, however, with a review of electromagnetic waves, before repeating the argument for waves in a weak gravitational field.
46.1 Waves in a gauge theory
We saw in Chapter 42 that, in flat spacetime with no sources, an equation of motion for the electromagnetic field can be written as ^(2){ }^{2}
This equation tells us that the electromagnetic field has dynamics of its own, independent of the presence of electric charges. These dynamics are wave-like excitations of the field.
As usual, we're free to make changes to tilde(A)\tilde{\boldsymbol{A}} subject to A_(mu)(x)rarrA_(mu)(x)-A_{\mu}(x) \rightarrow A_{\mu}(x)-del_(mu)chi(x)\partial_{\mu} \chi(x) since such changes in gauge do not alter the dynamics of the electromagnetic field, nor the things coupled to that field.
Example 46.1
As discussed in Chapter 42, we choose the Lorenz gauge, which means we define a new, but physically equivalent, gauge field with components A_(mu)^(')A_{\mu}^{\prime} which obey the constraint del_(mu)A^('mu)=0\partial_{\mu} A^{\prime \mu}=0, helpfully knocking out the first term in eqn 46.1. This leaves us with a simpler looking equation of motion
del^(2)A^('mu)=0\partial^{2} A^{\prime \mu}=0
whose solutions are plane waves of the form ^(3){ }^{3}
where omega=| vec(k)|\omega=|\vec{k}| and Re[]\operatorname{Re}[] reminds us to take the real part of a complex expression. However, we also saw in Chapter 42 that since the Lorenz gauge still leaves some freedom, ^(4){ }^{4} we could then impose the Coulomb gauge, upgrading to a new gauge field with components A_(mu)^('')A_{\mu}^{\prime \prime} where A_(0)^('')=0A_{0}^{\prime \prime}=0. With this further choice, the Lorenz condition then becomes vec(grad)* vec(A)^('')=0\vec{\nabla} \cdot \vec{A}^{\prime \prime}=0, which further reduces the number of independent field components by one. This makes it clear that although the electromagnetic field has four components, the physics allows only two independent components.
The equations of motion ^(5){ }^{5} in the Lorenz gauge read del^(2)A^(mu)=0\partial^{2} A^{\mu}=0, which, with A^(0)=0A^{0}=0, has plane wave solutions vec(A)= vec(epsilon)e^(ik*x)\vec{A}=\vec{\epsilon} \mathrm{e}^{\mathrm{i} k \cdot x}. The equation encoding the Coulomb gauge condition, vec(grad)* vec(A)=0\vec{\nabla} \cdot \vec{A}=0, leads to
which tells us that the direction of propagation of the wave is perpendicular to the polarization vec(epsilon)\vec{\epsilon}, i.e. the wave is transverse. For a wave propagating along zz with null momentum components k^(mu)=(| vec(k)|,0,0,| vec(k)|)k^{\mu}=(|\vec{k}|, 0,0,|\vec{k}|), the components of the electromagnetic field must then be functions of the form
Comparing our plane wave, we spot solutions such as A^(j)=epsilon^(j)(k)e^(-i(omega t-| vec(k)|z))A^{j}=\epsilon^{j}(\boldsymbol{k}) \mathrm{e}^{-\mathrm{i}(\omega t-|\vec{k}| z)}, for j=xj=x and yy, with ^(6)omega//| vec(k)|=1{ }^{6} \omega /|\vec{k}|=1. Putting everything together, we see that two possible choices of basis polarization vectors are simply
corresponding to linear polarization along xx or yy, respectively. This explains, in classical terms, the propagation of light waves in the electromagnetic field. ^(7){ }^{7}
Now we repeat the argument for gravitation in the weak-field limit. Using the gravitational version of the Lorenz gauge del^(nu) bar(h)_(mu nu)=0\partial^{\nu} \bar{h}_{\mu \nu}=0, we have a wave equation for gravity in the absence of sources, that says ^(8){ }^{8}
The good news is that this looks a lot like the wave equation for the electromagnetic field. It does however involve a second index, meaning that the polarizations of the fields that solve this wave equation have to be represented as square matrices, rather than simply as column vectors, as we had in the case of the electromagnetic field.
Let's consider an ansatz in the form of a plane gravitational wave. This can be written as
where Re[]\operatorname{Re}[] again reminds us that we must take the real part of this plane wave solution to describe the physical amplitude of the wave.
Example 46.2
For this to work within the Lorenz gauge we require del^(nu) bar(h)_(mu nu)=0\partial^{\nu} \bar{h}_{\mu \nu}=0, or
To satisfy the field equations we must also del^(2) bar(h)_(mu nu)=0\partial^{2} \bar{h}_{\mu \nu}=0 (eqn 46.8), so that
-k_(sigma)k^(sigma)A_(mu nu)e^(ik*x)=0-k_{\sigma} k^{\sigma} A_{\mu \nu} \mathrm{e}^{\mathrm{i} k \cdot x}=0
^(3){ }^{3} The term k_(mu)x^(mu)k_{\mu} x^{\mu} is written here as k*x\boldsymbol{k} \cdot \boldsymbol{x}. We also drop the prime on the field A^('mu)A^{\prime \mu}. ^(4){ }^{4} Recall that this is because we can make a further shift A_(mu)^(')rarrA_(mu)^('')=A_(mu)^(')-A_{\mu}^{\prime} \rightarrow A_{\mu}^{\prime \prime}=A_{\mu}^{\prime}-del_(mu)xi\partial_{\mu} \xi as long as del^(2)xi=0\partial^{2} \xi=0 (so that both A_(mu)^(')A_{\mu}^{\prime} and A_(mu)^('')A_{\mu}^{\prime \prime} satisfy the Lorenz condition). ^(5){ }^{5} We are now going to focus on A_(mu)^('')A_{\mu}^{\prime \prime}, so will drop the double prime from now on. We also suppress the Re[] notation, which is assumed for the wave solutions. ^(6){ }^{6} Of course, this implies that in SI units omega//| vec(k)|=c\omega /|\vec{k}|=c. ^(7){ }^{7} The polarization vectors introduced here carry the information about the here carry the information about the
spin state of the photon. Further asspin state of the photon. Further as-
pects of the quantum-mechanical treatpects of the quantum-mechanical treat-
ment are the topic of the following ment ar ^(8){ }^{8} Remember that when sources are present, the key equation is
We will return to this later when we put the sources back in, see eqn 46.39. ^(9){ }^{9} The following argument will make this statement a little more convincing. Let's consider a gravitational plane wave in bar(h)_(mu nu)\bar{h}_{\mu \nu}, which are constant on a surface on which its phase k*x=k_(mu)x^(mu)\boldsymbol{k} \cdot \boldsymbol{x}=k_{\mu} x^{\mu} surface on which its phase k*x=k_(mu)x^(mu)\boldsymbol{k} \cdot \boldsymbol{x}=k_{\mu} x^{\mu}
is constant. A photon moving in the is constant. A photon moving in the
direction of the null vector k\boldsymbol{k} travels on the curve
where l^(mu)l^{\mu} are the components of a constant vector and lambda\lambda parametrizes the curve. Dotting the equation for the curve with k_(mu)k_{\mu} and noting k*k=0\boldsymbol{k} \cdot \boldsymbol{k}=0, we find
This implies that the photon wave and gravitational wave share the same surfaces on which their respective phases are constant, and in fact their respective phases can only differ by a constant scalar value. Thus, the two waves move essentially in lockstep. We can therefore conclude that the gravitational wave travels at the speed of light with vec(k)\vec{k} giving its direction of travel. ^(10){ }^{10} The analogous expression for electromagnetism was del^(2)xi=0\partial^{2} \xi=0. As you might expect, the gravitational case looks very similar but has an extra index.
As a consequence of the last example, we have that the amplitude A_(mu nu)A_{\mu \nu} and wavevector k^(mu)k^{\mu} components obey two constraints
We conclude from the second expression that the gravitational wave is null, implying that the gravitational field propagates at the speed of light. ^(9){ }^{9} From the first condition we have that the wave's amplitude is orthogonal to its direction, making it a transverse plane wave. Analogous to the electromagnetic plane wave travelling along the zz-direction, we have that the components of the gravitational field are given by functions
Also by analogy with the plane wave, we would like to make a further choice of gauge to guarantee that bar(h)^(mu0)=0\bar{h}^{\mu 0}=0. It turns out that we can do exactly this, as we shall see in the next section.
46.2 Lorenz gauge for gravitational waves
We have already chosen the Lorenz gauge to guarantee the wave equation but, just as in the electromagnetic case, this doesn't exhaust the gauge freedom. Remembering that the gauge transformation we are considering is x^(mu)rarrx^(mu)+xi^(mu)x^{\mu} \rightarrow x^{\mu}+\xi^{\mu}, we note by analogy with electromagnetism that we can still obey Lorenz gauge as long as we have ^(10){ }^{10}
Gravitational waves are a little more complicated than their electromagnetic cousins, so we shall proceed step by step.
Example 46.3
We choose xi_(alpha)=B_(alpha)e^(i**x)\xi_{\alpha}=B_{\alpha} \mathrm{e}^{\mathrm{i} \cdot \cdot x}, so that the transformation is oscillatory in spacetime with amplitude B_(alpha)B_{\alpha}. The usual change in h\boldsymbol{h} resulting from a gauge transformation is given by
which means, for the trace-reversed components, that
This satisfies our previous constraint k^(alpha)A_(alpha beta)^(')=0k^{\alpha} A_{\alpha \beta}^{\prime}=0, as long as A_(alpha beta)A_{\alpha \beta} does too.
The amplitude of the transformation B_(alpha)B_{\alpha} is then chosen in such a way as to impose two additional (highly simplifying) constraints on the amplitude components A_(mu nu)A_{\mu \nu} of our wave-like solution. These are
where, in the second expression, u^(beta)u^{\beta} are the components of a fixed velocity. The first condition tells us that the wave is traceless, which means that, in this gauge, h_(mu nu)= bar(h)_(mu nu)h_{\mu \nu}=\bar{h}_{\mu \nu}. The second says that the wave is orthogonal to a velocity vector u\boldsymbol{u}. This state of affairs is known as transverse-traceless gauge.
Example 46.4
Choose a local inertial frame in which u\boldsymbol{u} has components u^(mu)=(1,0,0,0)u^{\mu}=(1,0,0,0). We then have from A_(alpha beta)u^(beta)=0A_{\alpha \beta} u^{\beta}=0 (eqn 46.20) the condition that we wanted, that A_(alpha0)=0A_{\alpha 0}=0. We arrange for the wave to travel along zz, so we have k\boldsymbol{k} with components k^(mu)=k^{\mu}=(| vec(k)|,0,0,| vec(k)|)(|\vec{k}|, 0,0,|\vec{k}|). This means from A_(alpha beta)k^(beta)=0A_{\alpha \beta} k^{\beta}=0 (eqn 46.12) that A_(alpha z)=0A_{\alpha z}=0 too. We therefore have a possibility of non-zero matrix elements for A_(xx),A_(yy),A_(xy)A_{x x}, A_{y y}, A_{x y} and A_(yx)A_{y x}. Since the wave is traceless, we must have A_(xx)=-A_(yy)A_{x x}=-A_{y y}. By symmetry A_(xy)=A_(yx)A_{x y}=A_{y x}. We therefore have, in this frame, the amplitude components
We then have a simplified solution to the wave equation bar(h)_(alpha beta)=A_(alpha beta)e^(ik*x)\bar{h}_{\alpha \beta}=A_{\alpha \beta} \mathrm{e}^{\mathrm{i} k \cdot x}, with omega=| vec(k)|\omega=|\vec{k}|. Remembering that g_(mu nu)=eta_(mu nu)+h_(mu nu)g_{\mu \nu}=\eta_{\mu \nu}+h_{\mu \nu}, and also that bar(h)_(mu nu)=h_(mu nu)\bar{h}_{\mu \nu}=h_{\mu \nu} with our choice of gauge, we deduce that this solution results in a metric line element
The two independent solutions represented here can be disentangled as follows. If A_(xy)=0A_{x y}=0, then our metric reduces to ds^(2)=-dt^(2)+(1+A_(xx)e^(-i(omega t-|k|z)))dx^(2)+(1-A_(xx)e^(-i(omega t-|k|z)))dy^(2)+dz^(2)\mathrm{d} s^{2}=-\mathrm{d} t^{2}+\left(1+A_{x x} \mathrm{e}^{-\mathrm{i}(\omega t-|k| z)}\right) \mathrm{d} x^{2}+\left(1-A_{x x} \mathrm{e}^{-\mathrm{i}(\omega t-|k| z)}\right) \mathrm{d} y^{2}+\mathrm{d} z^{2}. (46.23) On the other hand, if A_(xx)=0A_{x x}=0, our metric reduces to
Each of these solutions represents a plane gravitational wave, and eqns 46.23 and 46.24 are related to each other by a 45^(@)45^{\circ} rotation.
The metric line element has some oscillatory terms in it, but what does this mean? Does this mean that individual masses suspended in space will bob up and down as a gravitational wave goes past, just like boats on the ocean do when ocean waves go past? It's a bit more complicated than that, as shown in the next example.
Example 46.5
Place a test mass at rest at the origin. Its velocity is then x^(˙)^(mu)=(1,0,0,0)\dot{x}^{\mu}=(1,0,0,0). The geodesic equation, eqn 8.24 , is x^(¨)^(mu)+Gamma_(alpha beta)^(mu)x^(˙)^(alpha)x^(˙)^(beta)=0\ddot{x}^{\mu}+\Gamma_{\alpha \beta}^{\mu} \dot{x}^{\alpha} \dot{x}^{\beta}=0, so that
However, for a gravitational wave we must have Gamma_(00)^(i)=0\Gamma_{00}^{i}=0 since Gamma_(ab)^(i)=(1)/(2)eta^(ic)(del_(a)h_(cb)+:}\Gamma_{a b}^{i}=\frac{1}{2} \eta^{i c}\left(\partial_{a} h_{c b}+\right.{:del_(b)h_(ca)-del_(c)h_(ab))=0\left.\partial_{b} h_{c a}-\partial_{c} h_{a b}\right)=0 because h_(0a)=0h_{0 a}=0. This implies that x^(¨)^(i)=0\ddot{x}^{i}=0 so that the particle doesn't move. Oh dear; this is not what we wanted!
However, the fact that the coordinate of our test mass does not change as the gravitational wave rolls past does not mean anything. By now, we have learned to be suspicious of coordinates which can be chosen in lots of different ways. We know the metric line element does oscillate, so for example if we put a first test mass at the origin and a second test mass displaced a distance LL in the xx-direction, then the distance between them should be int_(0)^(L)sqrt(g_(xx))dx=Re[Lsqrt(1+A_(xx)e^(-i(omega t-|k|z)))]~~L+(LA_(xx))/(2)*cos(omega t-|k|z)\int_{0}^{L} \sqrt{g_{x x}} \mathrm{~d} x=\operatorname{Re}\left[L \sqrt{1+A_{x x} \mathrm{e}^{-\mathrm{i}(\omega t-|k| z)}}\right] \approx L+\frac{L A_{x x}}{2} \cdot \cos (\omega t-|k| z).
Happily this does oscillate, and also gives a method of detecting gravitational waves: by measuring the distance between pairs of masses as a function of time.
We can therefore understand gravitational waves by assessing their influence on groups of tiny test masses. Let's therefore examine in more detail the geodesic deviation of a set of particles. Geodesic deviation is described by the equation
where n\boldsymbol{n} is the separation vector of the particles and the velocity u\boldsymbol{u} is tangent to the streamlines formed by the geodesics. We will work in the local frame of the particles, amounting to a choice of the components of u\boldsymbol{u} as u^(mu)=(1,0,0,0)u^{\mu}=(1,0,0,0), so all we need to do is compute the relevant components of R\boldsymbol{R} from the components of the h\boldsymbol{h} field.
Example 46.6
With our choice of u\boldsymbol{u} the geodesic deviation expression becomes the component equation
For simplicity, let's choose n\boldsymbol{n} to initially have components n^(mu)=(0,a,0,0)n^{\mu}=(0, a, 0,0), implying that the two masses are separated by the spacelike interval aa at the start of the motion. From the last chapter, we have that
Recalling also that we raise and lower indices in the weak-field limit using eta_(mu nu)=\eta_{\mu \nu}=diag(-1,1,1,1)\operatorname{diag}(-1,1,1,1), we find that the components of the Riemann tensor relevant to the geodesic equation then become
{:[R_(0x0)^(x)=R_(x0x0)=-(1)/(2)(del^(2)h_(xx))/(delt^(2))],[R_(0x0)^(y)=R_(y0x0)=-(1)/(2)(del^(2)h_(xy))/(delt^(2))],[(46.30)R_(0y0)^(y)=R_(y0y0)=-(1)/(2)(del^(2)h_(yy))/(delt^(2))=-R_(0x0)^(x)]:}\begin{align*}
& R_{0 x 0}^{x}=R_{x 0 x 0}=-\frac{1}{2} \frac{\partial^{2} h_{x x}}{\partial t^{2}} \\
& R_{0 x 0}^{y}=R_{y 0 x 0}=-\frac{1}{2} \frac{\partial^{2} h_{x y}}{\partial t^{2}} \\
& R_{0 y 0}^{y}=R_{y 0 y 0}=-\frac{1}{2} \frac{\partial^{2} h_{y y}}{\partial t^{2}}=-R_{0 x 0}^{x} \tag{46.30}
\end{align*}
A further simplification is that, to first order in h_(mu nu)h_{\mu \nu} we can make the replacement tau=t\tau=t. This means that the separation vector of the particles, originally separated along the xx-direction by an interval aa, obey the equations of motion
{:(46.31)(del^(2)n^(x))/(delt^(2))=(1)/(2)a(del^(2)h_(xx))/(delt^(2))","quad(del^(2)n^(y))/(delt^(2))=(1)/(2)a(del^(2)h_(xy))/(delt^(2)):}\begin{equation*}
\frac{\partial^{2} n^{x}}{\partial t^{2}}=\frac{1}{2} a \frac{\partial^{2} h_{x x}}{\partial t^{2}}, \quad \frac{\partial^{2} n^{y}}{\partial t^{2}}=\frac{1}{2} a \frac{\partial^{2} h_{x y}}{\partial t^{2}} \tag{46.31}
\end{equation*}
By the same token, two particles initially separated along yy by a spacelike interval aa obey
{:(46.32)(del^(2)n^(y))/(delt^(2))=-(1)/(2)a(del^(2)h_(xx))/(delt^(2))","quad(del^(2)n^(x))/(delt^(2))=(1)/(2)a(del^(2)h_(xy))/(delt^(2)):}\begin{equation*}
\frac{\partial^{2} n^{y}}{\partial t^{2}}=-\frac{1}{2} a \frac{\partial^{2} h_{x x}}{\partial t^{2}}, \quad \frac{\partial^{2} n^{x}}{\partial t^{2}}=\frac{1}{2} a \frac{\partial^{2} h_{x y}}{\partial t^{2}} \tag{46.32}
\end{equation*}
On integrating these differential equations twice, we find that the separation n^(i)n^{i} of particles can be described in terms of the components of the h\boldsymbol{h}-field in transversetraceless gauge by writing
where h_(xx)^(TT)=-h_(yy)^(TT)=h_(+)h_{x x}^{\mathrm{TT}}=-h_{y y}^{\mathrm{TT}}=h_{+}and h_(xy)^(TT)=h_(yx)^(TT)=h_(xx)h_{x y}^{\mathrm{TT}}=h_{y x}^{\mathrm{TT}}=h_{\times}.
The equations from the previous example allow us to understand the polarization of the gravitational waves as causing motion of the particles arranged in a circle in Fig. 46.1(a). If the waves have h_(xy)=0h_{x y}=0 and h_(xx)!=0h_{x x} \neq 0, then the pattern of displacements corresponds to that shown in Fig. 46.1(b), with the masses moving along the xx and yy directions out of phase by 180^(@)180^{\circ}. This is sometimes called the + polarization. If, instead, we have h_(xx)=0h_{x x}=0 and h_(xy)!=0h_{x y} \neq 0, then the pattern of displacements is that shown in Fig. 46.1(c). This is simply the pattern from Fig. 46.1(b) rotated by 45^(@)45^{\circ}, hence the name: xx\times polarization. Notice how the two different polarizations are related by a 45^(@)45^{\circ} rotation ^(11){ }^{11} unlike the two electromagnetic linear polarizations, which are related by a 90^(@)90^{\circ} rotation. This is a consequence of the tensorial nature of the (0,2)(0,2) gravitational h\boldsymbol{h} field, as opposed to the 1 -form field tilde(A)\tilde{\boldsymbol{A}} that expresses electromagnetism.
Example 46.7
We can define the tidal field E\mathcal{E} with components E_(ij)=R_(0i0j)\mathcal{E}_{i j}=R_{0 i 0 j}, which can be written in terms of the two possible polarizations of the waves as
where e_(+)=e_(x)oxe_(x)-e_(y)oxe_(y)e_{+}=e_{x} \otimes e_{x}-e_{y} \otimes e_{y},
and e_(xx)=e_(x)oxe_(y)+e_(y)oxe_(x)\boldsymbol{e}_{\times}=\boldsymbol{e}_{x} \otimes \boldsymbol{e}_{y}+\boldsymbol{e}_{y} \otimes \boldsymbol{e}_{x},
are polarization tensors.
We can also depict the two polarizations by looking at the accelerations that are produced by the waves. This is shown in the diagrams in Fig. 46.2, where the field lines represent the acceleration field D^(2)n//dtau^(2)D^{2} \boldsymbol{n} / \mathrm{d} \tau^{2}. The four diagrams display the field at different points in the wave cycle for the two polarizations. These field diagrams are reminiscent of the magnetic field from a quadrupolar magnet and in fact these do indeed represent quadrupolar fields. This point deserves some further exploration, which we will do in the following example, first considering the case of sources of electromagnetic radiation before considering the analogous case of sources of gravitational radiation.
(c)
Fig. 46.1 Polarizations of gravitational waves. (a) A circle of masses. (b) The + polarization. (c) The xx\times polarization. ^(11){ }^{11} As we found in Example 46.4.
(a)
(b)
Fig. 46.2 Quadrupolar field lines representing the accelerations produced by a gravitational wave (a) with the + polarization and (b) with the xx\times polarization.
Example 46.8
(i) Electromagnetic waves: In the study of electromagnetic radiation, the multipole expansion of a source is a useful technique. Consider some object sitting in empty space, and assume it is made up of various charges, perhaps both positive and negative. We are interested in the electromagnetic field at some point distant from this object due to the effect of the charges within the object and their individual motions. The multipole expansion involves writing the charge distribution of this bounded object at a particular instant in time as a sum of terms of increasing complexity. We start off by adding up all the charge in the object and shrinking it to a point; at sufficient distance, the object will after all look like a point charge. The first term is then the monopolar contribution, the next term will be the dipole term, then we have a quadrupolar contribution. The charge in the object might be in some complicated motion, so we will also have some dynamic current-carrying contributions such as the magnetic dipole moment, magnetic quadrupole moment, etc. (no magnetic monopolar term though, since this turns out to be identically zero).
To get electromagnetic radiation out of this object, the charges within need to jiggle around. The acceleration of those charges could then produce electromagnetic waves. For example, suppose we could get the total charge in our object (the monopolar term) to oscillate up and down, we could generate spherically symmetry electromagnetic waves. However, this can't happen because we are not allowed to vary the total charge in our bounded object, charge being a conserved quantity. What we can do is vary the dipolar term in an oscillatory fashion, and this is how transmitters (and aerials) work. We make one end of an object positive and the other end negative by driving a current in one direction, and then by reversing the current we reverse the polarity of the dipole moment, making the formerly positive end negative and the formerly negative end positive. This oscillating dipole produces dipole radiation. Higher multipoles of electromagnetic radiation are possible, but charge conservation forbids the monopole term.
(ii) Gravitational waves: In the context of gravitation, we can consider an analogous multipole expansion of the mass distribution of an array of masses m_(i)m_{i} at positions vec(x)_(i)\vec{x}_{i}. (For simplicity, we will put the origin of our coordinates at the centre of mass of this distribution.) We are looking for contributions that aren't conserved, since these can vary and therefore act as the source of gravitational waves.
The monopole contribution is the total mass sum_(i)m_(i)\sum_{i} m_{i} (also known as the zeroth mass moment I_(0)\mathcal{I}_{0} ), which is constant owing to mass conservation. The dipole moment of the distribution (also known as the first mass moment I_(1)\mathcal{I}_{1} ), which is given by sum_(i)m_(i) vec(x)_(i)\sum_{i} m_{i} \vec{x}_{i}, but this is constant because of momentum conservation. The gravitational analogue of the magnetic moment is a first-order moment involving mass currents. It is given by the angular momentum vec(L)=sum_(i)m_(i)( vec(x)_(i)xx vec(x)^(˙)_(i))\vec{L}=\sum_{i} m_{i}\left(\vec{x}_{i} \times \dot{\vec{x}}_{i}\right). The first-order moment is therefore conserved because of angular momentum conservation. This implies that the lowest order contribution to gravitational radiation can only be from the next term: the quadrupolar field.
In expanding the metric close to the source in the weak-field regime, the 00 component can be expanded as a sum of the mass moments
where I_(ℓ)\mathcal{I}_{\ell} is the ℓ\ell th mass moment and a_(i)a_{i} are a set of constants. Similarly, the 0j0 j components of the metric can be expanded in terms of the current moments
where S_(ℓ)\mathcal{S}_{\ell} are current moments and b_(i)b_{i} are constants. Since for a source of linear dimension LL, we would expect on simple, dimensional grounds that I_(ℓ)prop ML^(ℓ)\mathcal{I}_{\ell} \propto M L^{\ell} and Sprop MvL^(ℓ)\mathcal{S} \propto M v L^{\ell}, where MM is mass and vv is velocity. We infer that the leading-order timevarying contribution to g_(00)g_{00} is from the quadrupolar mass term I_(2)\mathcal{I}_{2}, with the other terms contributing higher order corrections. This turns out to be the case.
46.3 Quadrupolar radiation
We will now derive a description of the quadrupolar radiation directly. We rewrite our wave equation with a source of radiation, so that
{:(46.39)-del^(2) bar(h)_(mu nu)=16 pi GT_(mu nu):}\begin{equation*}
-\partial^{2} \bar{h}_{\mu \nu}=16 \pi G T_{\mu \nu} \tag{46.39}
\end{equation*}
where the factor of GG has been restored. By analogy with the electromagnetic case, ^(12){ }^{12} we can write down the solution to this weak-field gravitation equation as
Assuming that the source is compact, so that it is concentrated in a small region Sigma\Sigma of linear size rr a distance R≫rR \gg r away, the spatial components bar(h)_(ij)\bar{h}_{i j} are given by
The energy-momentum tensor is subject to a conservation law del_(mu)T^(mu nu)=\partial_{\mu} T^{\mu \nu}= 0 (or equivalently T^(mu nu)_(,mu)=0T^{\mu \nu}{ }_{, \mu}=0 ) and including that allows ^(13){ }^{13} us to rewrite eqn 46.42 as
The integral is just the second moment of the mass distribution, which is (in energy units) the moment of inertia tensor I_(ij)I_{i j}, so we can write this equation in the simplified form
This is known as the Einstein quadrupole formula. ^(14){ }^{14} Note that the moment of inertia tensor I_(ij)=int_(Sigma)d^(3)y rhoy_(i)y_(j)I_{i j}=\int_{\Sigma} \mathrm{d}^{3} y \rho y_{i} y_{j} differs from the quadrupole moment Q_(ij)=int_(Sigma)d^(3)y rho(y_(i)y_(j)-(1)/(3)r^(2)delta_(ij))=I_(ij)-(1)/(3)Tr IQ_{i j}=\int_{\Sigma} \mathrm{d}^{3} y \rho\left(y_{i} y_{j}-\frac{1}{3} r^{2} \delta_{i j}\right)=I_{i j}-\frac{1}{3} \operatorname{Tr} I solely by its trace. Since we are working in the transverse-traceless gauge, we are insensitive to the trace and hence we can think of the moment of inertia as a quadrupole moment.
Example 46.9
Let's put some numbers in to see how big an effect this could be. With a source at, say, R=100MPcR=100 \mathrm{MPc} away from us, consisting of a pair of black holes, each of mass M=30M_(o.)M=30 M_{\odot} orbiting each other at f=omega//(2pi)=10Hzf=\omega /(2 \pi)=10 \mathrm{~Hz} and separated by twice a=2000kma=2000 \mathrm{~km} and using I tilde(I)~~4Ma^(2)omega^(2)I \tilde{I} \approx 4 M a^{2} \omega^{2} then
{:(46.46)| bar(h)|~~(2G)/(c^(4)R)4Ma^(2)omega^(2)=(32pi^(2)GMa^(2)f^(2))/(c^(4)R)~~5xx10^(-21):}\begin{equation*}
|\bar{h}| \approx \frac{2 G}{c^{4} R} 4 M a^{2} \omega^{2}=\frac{32 \pi^{2} G M a^{2} f^{2}}{c^{4} R} \approx 5 \times 10^{-21} \tag{46.46}
\end{equation*}
This will be a small effect! ^(15){ }^{15} After all, the energy-momentum of the gravitational field has no real meaning locally since you can always transform to a freely falling frame and gravity disappears.
The value of bar(h)~~10^(-21)\bar{h} \approx 10^{-21} is very small, and much tinier even than the Newtonian potential on the surface of the Earth, where |h_(00)|=\left|h_{00}\right|=2GM_(o+)//(R_(o+)c^(2))~~10^(-9)2 G M_{\oplus} /\left(R_{\oplus} c^{2}\right) \approx 10^{-9}, and this highlights an issue with our approach so far. The gravitational waves that are detected in experiments to date have frequencies in the tens of Hz to a few kHz range and consequently have wavelengths (tens to thousands of km ) which are short compared to a terrestrial scale. These ripples in spacetime are superimposed on a larger, more slowly varying background due to astrophysical objects (including the astrophysical object on which a gravitational detector might be mounted, i.e. the Earth!). We have described the waves using a metric g_(mu nu)=eta_(mu nu)+h_(mu nu)g_{\mu \nu}=\eta_{\mu \nu}+h_{\mu \nu}, expanding around a flat spacetime, whereas we probably really should write something like
where g_(mu nu)^(b)g_{\mu \nu}^{\mathrm{b}} describes some background curved metric. However, even this may not be enough as it is not obvious which contributions should be background and which should be due to the gravitational waves. Moreover, as we will see, gravitational waves carry energy and momentum and so this is also going to act as a source of curvature of spacetime. Our linearized theory has ignored this effect, and so one way of including this is to say that our linearized Einstein tensor G_(mu nu)^((1))G_{\mu \nu}^{(1)} is modified to
where T_(mu nu)T_{\mu \nu} is due to matter and t_(mu nu)t_{\mu \nu} is due to the effect of the gravitational field itself. Our exact Einstein equation is
where the sum is over a linear approximation G_(mu nu)^((1))G_{\mu \nu}^{(1)} (linear in h_(mu nu)h_{\mu \nu} ), a quadratic approximation G_(mu nu)^((2))G_{\mu \nu}^{(2)} (quadratic in h_(mu nu)h_{\mu \nu} ), etc. If we truncate the series at the quadratic term, then these equations suggest that
It turns out that this quantity is not gauge invariant, and you probably wouldn't expect it to be. ^(15){ }^{15} The trick is to average this quantity over a region of spacetime that is spatially larger than the wavelength of a gravitational wave and temporally larger than the reciprocal a gravitational-wave frequency, thereby capturing the curvature of the background spacetime. Thus, we arrive at
where the angle brackets denote this averaging process and we have restored the factors of GG and cc.
46.4 Radiated energy and power
The evaluation of an expression for t_(mu nu)t_{\mu \nu} from eqn 46.52 is rather tedious, but in the transverse-traceless gauge it produces ^(16){ }^{16} the rather pleasing result
where we have used the Einstein quadrupole formula, eqn 46.44. We can work out the total flux of power from our source, namely the energy passing per second through a spherical surface of radius RR using the fact the energy inside a volume VV is intd^(3)xT^(00)\int \mathrm{d}^{3} x T^{00}, so that the rate of change of energy is
{:(46.55)(d)/((d)t)intd^(3)xT^(00)=intd^(3)xdel_(t)T^(00)=-c intd^(3)xdel_(i)T^(0i):}\begin{equation*}
\frac{\mathrm{d}}{\mathrm{~d} t} \int \mathrm{~d}^{3} x T^{00}=\int \mathrm{d}^{3} x \partial_{t} T^{00}=-c \int \mathrm{~d}^{3} x \partial_{i} T^{0 i} \tag{46.55}
\end{equation*}
where we have used del_(mu)T^(mu nu)=0\partial_{\mu} T^{\mu \nu}=0 and del_(t)=cdel_(0)\partial_{t}=c \partial_{0}. The energy carried by the gravitational wave is the negative of this (because energy lost inside the volume VV is carried away by the gravitational waves), so together with Stokes' theorem we have the rate of change of energy emitted by quadrupolar waves as an integral over the surface SS and hence
where the final result is an integral over solid angle (inside a sphere of radius RR ). In the present case, we are working in terms of t_(mu nu)t_{\mu \nu} due to the effect of the gravitational field, and therefore we need t^(0r)t^{0 r} in eqn 46.56. We can get this from eqn 46.53 , but we also can use the fact that bar(h)_(ij)\bar{h}_{i j} is a function of t-R//ct-R / c and hence
The prefactor G//(5c^(5))G /\left(5 c^{5}\right) is extremely small and hence most potential sources of gravitational radiation produce only a very weak emitted power. ^(20){ }^{20} ^(16){ }^{16} The calculation is rather tedious, but it is laid out in detail in Exercise 46.2 for any reader who wishes to follow it through. ^(17){ }^{17} When we are just focussing on spatial vectors, the distinction between upstairs and downstairs indices becomes unnecessary, and following the lead of most authors in this field, we will write (:I^(⃛)_(ij)^(TT)I^(⃛)_(ij)^(TT):)\left\langle\dddot{I}_{i j}^{\mathrm{TT}} \dddot{I}_{i j}^{\mathrm{TT}}\right\rangle rather than (:I^(⃛)_(ij)^(TT)I^(⃛)_(TT)^(ij):)\left\langle\dddot{I}_{i j}^{\mathrm{TT}} \dddot{I}_{\mathrm{TT}}^{i j}\right\rangle, which is less fussy. The Einstein summation convention still holds, so ii and jj are summed over. ^(18){ }^{18} See Exercise 46.5(d). ^(19){ }^{19} See Exercise 46.5(e). Note that this is the energy flux associated with the gravitational waves. The rate of change of energy inside the volume is minus this. ^(20){ }^{20} Moreover, note that a spherically symmetric source (such as rotating star) has zero (:I_(ij)I_(ij):)\left\langle I_{i j} I_{i j}\right\rangle (its moment of inertia does not change with time) and will not emit gravitational waves. ^(21){ }^{21} As shown in Exercise 46.4, the emitted power for most orbiting systems is tiny. To get some sizeable, and hence potentially detectable, emitted power, we need some pretty dramatic situations, such as the two closely spaced black holes whirling around each at ferocious speed considered in this example. We could have chosen two neutron stars, but we wanted to be even tron stars, but we wanted to even
more dramatic! Note that L propomega^(6)L \propto \omega^{6} more dramatic! Note that L propomega^(6)L \propto \omega^{6}
so high frequency signals contain much so high freque
more power. ^(22){ }^{22} The derivation is in Maggiore, Sec tion 3.3.3. Note that this expression for the angular momentum implies that angular momentum will not be emitted if the source is axisymmetric.
Example 46.10
Returning to the numbers ^(21){ }^{21} in Example 46.9, we would estimate (using (:I^(⃛)_(ij)I^(⃛)_(ij):)=\left\langle\dddot{I}_{i j} \dddot{I}_{i j}\right\rangle=128M^(2)a^(4)omega^(6)128 M^{2} a^{4} \omega^{6}, as shown in eqn 46.106 from Exercise 46.6) that the gravitational luminosity LL of our binary black-hole system would be
although the power flux on Earth works out to be less than 1mWm^(-2)1 \mathrm{~mW} \mathrm{~m}^{-2}.
One can also show that not only is energy carried away by gravitational waves but also angular momentum. By analogy with eqn 46.56 , the rate of change of angular momentum is given by
For both eqn 46.60 and eqn 46.63 the quantities I^(⃛)_(ij)\dddot{I}_{i j} are evaluated at the retarded time t_(r)=t-R//ct_{\mathrm{r}}=t-R / c.
Example 46.11
An orbiting pair of black holes has a gravitational-wave luminosity LL given by eqn 46.61 so that L=(128 G)/(5c^(5))M^(2)a^(4)omega^(6)L=\frac{128 G}{5 c^{5}} M^{2} a^{4} \omega^{6}, but the two objects (assuming Newtonian mechanics holds) have a gravitational potential energy equal to -GM^(2)//(2a)-G M^{2} /(2 a) and a kinetic energy of 2xx(1)/(2)Mv^(2)=GM^(2)//(4a)2 \times \frac{1}{2} M v^{2}=G M^{2} /(4 a) so that the total energy is E=-GM^(2)//(4a)=E=-G M^{2} /(4 a)=-Ma^(2)omega^(2)-M a^{2} \omega^{2} and
Note that the two orbiting black holes are in a bound state so that the total energy EE is negative. Therefore, when energy is lost via emission of gravitational waves the energy becomes more negative and hence |E||E| is larger. This causes the orbital period PP to decrease (hence the minus sign in eqn 46.68) and the orbiting becomes faster. This speed up of the orbital motion (called spin-up) was observed in a binary pulsar system in 1974 by Hulse and Taylor and was the first (indirect) discovery of gravitational waves (see Fig. 46.3). We receive radio emission only from one of the poppler shift of the radio perions us to estimate the athe the 775 hour and to measue that obit is very slowly seeding up, with a decrease of orbital period 76.5 microseconds per year. As |E||E| increases the two black holes get closer together and
解 ravitational waves. The only non-zero component is perpendicular to the by the plane and yields ^(23){ }^{23}. The only non-zero component is perpendicular to the orbital
The angular momentum of our orbiting pair of black holes is J=2Ma^(2)omega=J=2 M a^{2} \omega=sqrt(GMa^(3))=-E omega//2\sqrt{G M a^{3}}=-E \omega / 2. Also, eliminating aa and omega\omega we have E=-G^(2)M^(5)//(4J^(2))E=-G^{2} M^{5} /\left(4 J^{2}\right) and hence dJ//dt=-(J//2E)dE//dt\mathrm{d} J / \mathrm{d} t=-(J / 2 E) \mathrm{d} E / \mathrm{d} t and therefore substituting in our expression for dE//dt\mathrm{d} E / \mathrm{d} t gives
in agreement with eqn 46.69 (the sign change being that the angular momentum lost in agreement with eqn 46.69 (the sign change being that the ang orbiting pair is carried off by the gravitational waves).
by the or
Our treatment of these astrophysical cases has assumed Newtonian dynamics and so therefore we would not expect it to apply in the final case of the inspiral of two compact objects as their orbital periods drop rapidly and their orbital velocities become relativistic. Our approach has also assumed a linearized approximation to general relativity and here we will encounter a big difference with electromagnetism. In the electromagnetic case, the force-mediating particle (the photon) and its associated wave (the electromagnetic wave) have no charge and so there is no nonlinear effect in free space. For gravity, our gravitational waves do transmit energy (i.e. mass) which is itself a source of gravity; thus, our theory is inherently nonlinear and we take a brief moment in the next section to examine the consequences of this.
46.5 An exact solution
So far we have only examined gravitational waves within the linear approximation. Will they survive in the exact, nonlinear theory of gravitation? We can show that they shall.
We argued in Section 46.2 for solutions as a function of phi=(-omega t+\phi=(-\omega t+kz)k z) with omega=k^(z)=|k|\omega=k^{z}=|k|, since gravitational waves are described by null velocity vectors. Since we expect gravitational waves to be null, it is useful to work in light-cone coordinates u=t-zu=t-z and v=t+zv=t+z in which the Minkowski metric is written in terms of the line element as
Fig. 46.3 The orbital decay of the Hulse-Taylor binary pulsar system PSR B1913+16, together with the prediction from general relativity Our treatment has assumed circular orbits of two equal masses, but the curve here assumes the correct elliptical orbital parameters. This plot shows cumulative shifts in the time of periastron. [Figure reproduced from J. M. Weisberg and Y. Huang, Astrophysical Journal 829, 55 (2016)
doi:10.3847/0004-637X/829/1/55 doi:10.3847/0004-637X/829/1/55 (C)American Astronomical Society.] ^(23){ }^{23} See Exercise 46.7.
We can therefore look for an exact, wavelike solution to the Einstein field equation with metric
If F(u)=1+epsi(u)F(u)=1+\varepsilon(u), where the function epsi(u)\varepsilon(u) is assumed small, then eqn 46.75 can be solved by G(u)=1-epsi(u)G(u)=1-\varepsilon(u), since we obtain
Since, in this case we have h_(xx)=epsi(u)h_{x x}=\varepsilon(u) and h_(yy)=-epsi(u)h_{y y}=-\varepsilon(u), this is just the case shown in Fig. 46.1(b) of a + polarized wave.
We conclude that the exact, nonlinear case therefore supports solutions similar to the ones found for the linear, weak-field theory.
46.6 The discovery of gravitational waves
Gravitational waves were for a hundred years, apart from the impressive but indirect evidence from the spin-up of the Hulse-Taylor binary pulsar, a theoretical construct. It was believed that they were there, but they had not been directly detected. That has all changed due to the extraordinary results that have been obtained from ground-based gravitational-wave observatories. These are designed to probe the relatively high-frequency portion of the gravitational-wave spectrum, from about 10 Hz to about 10 kHz . This spectrum is dominated by signals originating from stellar-mass compact sources, principally coalescing binary black hole and neutron star systems.
In ground-based observatories, the idea is to use the alternating motion of masses produced by a passing gravitational wave that we can detect using laser interferometry. The key here is to use the sensitivity of interference to optical path length to detect the oscillatory motion of the masses. We've seen how gravitational waves lead to quadrupolar oscillations of masses. To see the + polarization for a wave propagating in the zz-direction, for example, we would need to access the relative motion
of at least two masses, such as one separated along xx and one separated along yy. In order to do this, we use the masses to form a Michelson ^(24){ }^{24} interferometer in the x-yx-y plane, as shown in Fig. 46.4. Laser light is split by a beam-splitting mirror and travels along the two arms of the interferometer. The lengths of the arms are defined by mirrored masses, which reflect the light back to where the beams are recombined and detected via a photodetector as shown. The bad news is that the effect is small. Even for merging neutron stars or collapsing supernovae, we only expect fractional changes in the displacement of the mirrors in the experiment of 10^(-21)10^{-21}. In order to see an oscillation from a gravitational wave, a typical photon must remain in the system for at least half the period of the gravitational wave, which turns out to be of order milliseconds. As a result, we require interferometers with very long arms and exceedingly sensitive detection technologies. This is far from trivial. It took several decades to develop the technology to achieve this extraordinary level of sensitivity.
Example 46.13
The intensity at the photodetector is determined by the phase shift between the light combined from each arm of the interferometer. Take the arms to have lengths L_(x)L_{x} and L_(y)L_{y}. If the arm lengths vary by respective amounts delta x\delta x and delta y\delta y then the phase shift is Delta phi=omega_(0)(2delta y-2delta x)\Delta \phi=\omega_{0}(2 \delta y-2 \delta x), where omega_(0)\omega_{0} is the laser frequency. Using eqn 46.33 we can rewrite this as
allowing us to see the relationship between the h\boldsymbol{h} field and the phase shift. Assuming the arms are roughly the same length LL we have that the intensity measured is
{:(46.78)I prop Delta phi(t)~~2Lh_(+)(t):}\begin{equation*}
I \propto \Delta \phi(t) \approx 2 L h_{+}(t) \tag{46.78}
\end{equation*}
We conclude that the length of the arms must be maximized to have a chance of seeing the effect.
LIGO stands for the Laser Interferometry Gravitational Wave Observatory. It comprises two identical detectors: one interferometer in Livingston, Louisiana and one in Hanford, Washington. The arms of the interferometers are 4.2 km in length (for comparison, the MichelsonMorley experiment involved arms of length 1.3 m ). However, these still cannot introduce a long-enough optical path difference to detect the oscillations from the tiny displacements caused by the perturbations to the metric caused by gravitational waves. As a result, Fabry-Perot cavities are also mounted along the arms, increasing the effective optical length of the arms up to ~~1200km\approx 1200 \mathrm{~km}. The merger of two black holes provided the source of gravitational waves that were detected on 14 September, 2015 by both of the twin LIGO interferometers (Fig. 46.5). This particular gravitational-wave signal is thought to have been due to the inward spiral and merger of a pair of black holes, estimated to be around 36M_(o.)36 M_{\odot} and 29M_(o.)29 M_{\odot}, and the subsequent 'ringdown' of the single resulting black hole of mass 62M_(o.)62 M_{\odot}, with the remaining 3M_(o.)c^(2)3 M_{\odot} c^{2} energy radiated as gravitational waves. This measurement and subsequent ones have provided ^(24){ }^{24} Albert A. Michelson (1952-1931). The Michelson-Morley experiment was, of course, very important in the development of special relativity, although it didn't seem to be one of Einstein's main motivations. It did, however, represent the key test of a v^(2)//c^(2)v^{2} / c^{2} correction to the pre-relativistic theory, predicted by Lorentz on the strength of the ether theory using a locally defined time. The null result of the experiment led Lorentz to propose length contracled Lorentz to propose length contrac-
tion. See Cheng for an accessible account of the history.
Fig. 46.4 (a) A schematic of a gravitational-wave detector. (b,c) The gravitational-wave detector. (b,c) The
gravitational wave's effect on a test gravitational wave's effect on a
mass system and its corresponding effect on the arms of the interferometer.
Fig. 46.5 The gravitational-wave event GW150914 observed by the LIGO Hanford (H1, left column panels) and Livingston (L1, right column panels) detectors. Times are shown relative to 14 September, 2015 at 09:50:45 UTC. [From B. P. Abbott et al. Phys. Rev. Lett. 116, 061102 (2016), DOI: 10.1103/PhysRevLett.116.061102, published by the American Physical Society under the terms of the Creative Commons Attribution 3.0 License.]
some of the most stringent experimental verifications of general relativity.
These results have stimulated the building of further gravitational wave observatories, including the Einstein Telescope and Cosmic Explorer which are planned to achieve an order of magnitude increase in sensitivity and would therefore be able to study the evolution of compact objects in the early Universe. However, other projects are aimed at building an observatory a long way from the ground. The proposed space-based Laser Interferometer Space Antenna (LISA) will be able to explore much lower frequency gravitational waves (from around 100 muHz100 \mu \mathrm{~Hz} to 100 mHz ). It is expected to be capable of detecting at very high redshift the first seed black holes formed, as well as intermediate-mass and 'light' super-massive coalescing black hole systems in the 10^(2)-10^(7)M_(o.)10^{2}-10^{7} M_{\odot} range. It should therefore be able to follow the evolution of black holes right from the early Universe.
To get to really low frequencies, the best technology for detecting gravitational waves is the use of pulsar timing arrays. These work in the nanohertz to microhertz frequency band and can be used to detect gravitational-wave remnants from the past mergers of supermassive black holes. The basic idea is that rather than using laser interferometers (as used in both LIGO and LISA), one measures the pulse arrival time at Earth from an array of millsecond pulsars (i.e. rapidly rotating neutron stars). These pulsars have extremely regular and stable periods and act as ideal timing sources. A gravitational wave emitted from some astrophysical source will pass the pulsar and the Earth and so this will produce two perturbations on the signal received from the pulsar: one from spacetime variations at the pulsar and the other from spacetime
variations on Earth. Data from the pulsar arrays have to be accumulated over many years, so these experiments are not quick.
The stochastic background of remnant primordial gravitational waves produced during the Big Bang will dominate the gravitational-wave spectrum down to approximately 10^(-18)Hz10^{-18} \mathrm{~Hz} and depending on the cosmological model then some of this background may lie in a region that could be probed by the technologies discussed above. This is an open question, but the modern era of gravitational-wave astronomy that has just begun looks like it is going to have an exciting and productive future in the coming decades.
We have seen in this chapter how general relativity predicts the existence of wave-like excitations in the gravitational field. In the next chapter, we examine gravitational waves from another point of view: that of quantum fields, where these waves are quantized into force-carrying particles known as gravitons.
Chapter summary
A gravitational wave solution to the linearized Einstein equation has the form bar(h)_(mu nu)=Re[A_(mu nu)e^(ik*x)]\bar{h}_{\mu \nu}=\operatorname{Re}\left[A_{\mu \nu} \mathrm{e}^{\mathrm{i} \boldsymbol{k} \cdot \boldsymbol{x}}\right]. The waves have null k\boldsymbol{k} and amplitude orthogonal to the direction of propagation.
Transverse-traceless gauge can be used to simplify the solutions, leading to + and xx\times polarizations which are 45^(@)45^{\circ} out of phase.
There are wave solutions containing nonlinear terms that solve the Einstein equations.
The gravitational wave luminosity of a source is given by
Gravitational wave astronomy is a rapidly growing area in astrophysics. The laser interferometers that make up the ground-based LIGO or the space-based LISA have, or will have, extraordinary sensitivity. Lower frequency gravitational waves are expected to be detected by pulsar timing arrays.
Exercises
(46.1) Verify the components in eqn 46.30 .
(46.2) (a) Recall that the connection coefficients are
and the Ricci tensor is R_(mu nu)=del_(alpha)Gamma^(alpha)_(nu mu)-del_(nu)Gamma^(alpha)_(alpha mu)+Gamma^(beta)_(beta alpha)Gamma^(alpha)_(nu mu)-Gamma^(beta)_(nu alpha)Gamma^(alpha)_(beta mu)R_{\mu \nu}=\partial_{\alpha} \Gamma^{\alpha}{ }_{\nu \mu}-\partial_{\nu} \Gamma^{\alpha}{ }_{\alpha \mu}+\Gamma^{\beta}{ }_{\beta \alpha} \Gamma^{\alpha}{ }_{\nu \mu}-\Gamma^{\beta}{ }_{\nu \alpha} \Gamma^{\alpha}{ }_{\beta \mu}. (46.80)(46.80)
In linearized gravity, we take g_(mu nu)=eta_(mu nu)+h_(mu nu)g_{\mu \nu}=\eta_{\mu \nu}+h_{\mu \nu} so that g^(mu nu)=eta^(mu nu)-h^(mu nu)g^{\mu \nu}=\eta^{\mu \nu}-h^{\mu \nu} (and hence g_(mu alpha)g^(alpha nu)=g_{\mu \alpha} g^{\alpha \nu}=delta_(mu)^(nu)+O(h^(2))\delta_{\mu}^{\nu}+\mathrm{O}\left(h^{2}\right) ). The tensor field h_(mu nu)h_{\mu \nu} is symmetric, so we are free to swap indices around since h_(mu nu)=h_(nu mu)h_{\mu \nu}=h_{\nu \mu}. Hence, show that the quadratic contribution to the Ricci tensor is
where h=h_(alpha)^(alpha)h=h_{\alpha}{ }^{\alpha}.
(c) Show that the only two terms that survive in this expression are the first two, using
Tracelessness (h=0)(h=0) [this annihilates terms 11,12 , and 13].
The gauge condition del_(mu)h^(mu nu)=0\partial_{\mu} h^{\mu \nu}=0 [this annihilates 4,8,94,8,9, and 10].
Any divergence vanishes on the boundary (after averaging over a volume). [This results in 3,5 , and 7 vanishing once you also apply the gauge condition, and 6 similarly going once you use the field equations {:del^(2)h_(alpha mu)=0]\left.\partial^{2} h_{\alpha \mu}=0\right].
Show further that the Ricci scalar vanishes using these conditions.
(d) Hence, show that
(46.4) In Newtonian gravitation, typical velocities and accelerations are v^(2)~~Gm//rv^{2} \approx G m / r and a~~Gm//r^(2)a \approx G m / r^{2}, respectively.
(a) Using the result of the previous problem, show that we expect
{:(46.92)h~~(Gm)/(R)*v^(2):}\begin{equation*}
h \approx \frac{G m}{R} \cdot v^{2} \tag{46.92}
\end{equation*}
(b) The second-order terms in the weak-field expansion suggest the gravitational energy-density tt varies as t~~(h^(˙))^(2)//Gt \approx(\dot{h})^{2} / G. Show that
{:(46.93)t~~(1)/(R)(G^(4)m^(5))/(r^(5)):}\begin{equation*}
t \approx \frac{1}{R} \frac{G^{4} m^{5}}{r^{5}} \tag{46.93}
\end{equation*}
(c) Integrating over a sphere of radius RR and restoring factors, show further that the power radiated via gravitational radiation is approximately
{:(46.94)P~~(G^(4)m^(5))/(r^(5)c^(5)):}\begin{equation*}
P \approx \frac{G^{4} m^{5}}{r^{5} c^{5}} \tag{46.94}
\end{equation*}
(d) Estimate the power radiated by (i) the solar system, (ii) a collapsing binary star, formed from two stellar-mass black holes, and (iii) a fist, shaken in anger.
(46.5) In Exercise 30.7, we considered a projection operator P_(ij)P_{i j} which projected a vector onto a unit spatial vector vec(n)=(n_(1),n_(2),n_(3))\vec{n}=\left(n_{1}, n_{2}, n_{3}\right). We now want to find a traceless version of the same thing.
(a) Show that P_(ij)=delta_(ij)-n_(i)n_(j)P_{i j}=\delta_{i j}-n_{i} n_{j} acts as a projection operator on a vector vec(v)\vec{v} so that n_(i)P_(ij)v_(j)=0n_{i} P_{i j} v_{j}=0. Show also that Tr P=P_(ii)=2\operatorname{Tr} P=P_{i i}=2 and P_(ij)P_(jk)=P_(ik)P_{i j} P_{j k}=P_{i k}.
(b) The action of this projection operator on a tensor M_(ij)M_{i j} requires it to be used twice, so that
and show that M_(kℓ)^(')n_(k)=M_(kℓ)^(')n_(ℓ)=0M_{k \ell}^{\prime} n_{k}=M_{k \ell}^{\prime} n_{\ell}=0. However, it is not traceless, and you can show this by proving that M_(kk)^(')=Tr(PM)M_{k k}^{\prime}=\operatorname{Tr}(P M).
(c) The solution is to use the traceless transverse projection operator
This can be used to prove eqn 46.60.
(46.6) Consider two black holes of the same mass MM in a circular orbit of radius aa around their common centre of mass. At a time tt the stars are at positions with Cartesian coordinates
{:(46.101)(x","y","z)=(a cos omega t","a sin omega t","0):}\begin{equation*}
(x, y, z)=(a \cos \omega t, a \sin \omega t, 0) \tag{46.101}
\end{equation*}
and
{:(46.102)(x","y","z)=(-a cos omega t","-a sin omega t","0):}\begin{equation*}
(x, y, z)=(-a \cos \omega t,-a \sin \omega t, 0) \tag{46.102}
\end{equation*}
respectively.
(a) Show that the angular frequency of the circular motion is given by omega=(GM//4a^(3))^((1)/(2))\omega=\left(G M / 4 a^{3}\right)^{\frac{1}{2}}.
(b) Show that the moment of inertia tensor is given by
{:[I^(ij)=2Ma^(2)([cos^(2)omega t,cos omega t sin omega t,0],[cos omega t sin omega t,sin^(2)omega t,0],[0,0,0])],[=Ma^(2)([1+cos 2omega t,sin 2omega t,0],[sin 2omega t,1-cos 2omega t,0],[0,0,0])]:}\begin{aligned}
I^{i j} & =2 M a^{2}\left(\begin{array}{ccc}
\cos ^{2} \omega t & \cos \omega t \sin \omega t & 0 \\
\cos \omega t \sin \omega t & \sin ^{2} \omega t & 0 \\
0 & 0 & 0
\end{array}\right) \\
& =M a^{2}\left(\begin{array}{ccc}
1+\cos 2 \omega t & \sin 2 \omega t & 0 \\
\sin 2 \omega t & 1-\cos 2 \omega t & 0 \\
0 & 0 & 0
\end{array}\right)
\end{aligned}
(c) Show further that the oscillating gravitational field a distance RR away is given by
where the retarded time is t_(r)=t-Rt_{r}=t-R. This equation tells us that the oscillating field has a frequency twice that of the orbit of the binary system. It is in the form of eqn 46.21 and so also describes the gravitational radiation emitted in the zz-direction.
(d) Show finally that
I^(⃛)^(ij)=8Ma^(2)omega^(3)([sin 2omega t,-cos 2omega t,0],[-cos 2omega t,-sin 2omega t,0],[0,0,0])\dddot{I}^{i j}=8 M a^{2} \omega^{3}\left(\begin{array}{ccc}
\sin 2 \omega t & -\cos 2 \omega t & 0 \\
-\cos 2 \omega t & -\sin 2 \omega t & 0 \\
0 & 0 & 0
\end{array}\right)
(46.7) Using the results of the previous problem, derive eqn 46.69 using the expression given in eqn 46.63.
47.1 Force-carrying particles 512 47.2 Photon propagation and polarization ^(1){ }^{1} This is a question Richard Feynman (1918-1988) asked himself in his course on gravity. Our approach in this chapter follows Feynman's resulting Lec tures on Gravitation (1995). This was also one of the early paths taken by several scientists (including Feynman) attempting to formulate a quantum theory of gravity. Although informative in many ways, it would ultimately prove unsuccessful. For an introduction to the history, see A. Ashketar Quantum Gravity, arXiv:grqc/0410054v2 (2004). ^(2){ }^{2} We won't assume any familiarity with the techniques of quantum field theory in this chapter. ↷\curvearrowright Since this chapter discusses a hypothetical particle (i.e. the graviton) it can be skipped on a first reading.
The properties of gravitons
Abstract
Though free to think and act, we are held together, like the stars in the firmament, with ties inseparable. These ties cannot be seen, but we can feel them. I cut myself in the finger, and it pains me: this finger is a part of me. I see a friend hurt, and it hurts me, too: my friend and I are one. And now I see stricken down an enemy, a lump of matter which, of all the lumps of matter in the universe, I care least for, and it still grieves me. Does this not prove that each of us is only part of a whole? Nikola Tesla (1856-1943)
Imagine a parallel world where civilization had formulated quantum field theory (QFT) but had no geometrical theory of gravitation. In seeking to describe gravity with the tools at hand, where would their reasoning take them? ^(1){ }^{1} We shall suggest in this chapter that gravitational excitations would be a natural place for them to start. This would lead to the idea of a graviton, a force-carrying particle derived from the quantization of gravitational waves.
In this chapter, we therefore pick up the discussion of the gravitational interactions between masses from the point of view of field theory. We shall work in the weak-field limit, where gravitation is described by a field h(x)\boldsymbol{h}(x) and indices are raised and lowered by the Minkowski metric eta\boldsymbol{\eta}. Our plan is to work out as much as we can about gravity waves by drawing on the concepts of QFT,^(2)\mathrm{QFT},{ }^{2} where the waves are quantized into forcecarrying graviton particles. Although we do not yet have a quantum field theory of gravitation, we can still make progress using the tools from field theory and, indeed, some candidate theories of gravitation predict the existence of gravitons. We shall see that, as in the previous two chapters, we can gain a certain amount of insight by comparing gravity waves to light waves, or in this quantum context, comparing photons to gravitons. The result will be a rather different way to think about gravitational interactions to that we have considered thus far.
47.1 Force-carrying particles
One of the most interesting things about particles is that they interact with each other. Hideki Yukawa's great insight was his suggestion that this interaction process itself involves particles. These force-carrying particles are slightly different to their more familiar cousins with whom we're already acquainted. Yukawa's idea centres around one key notion:
Particles interact by exchanging virtual,force-carrying particles.
Recall that a virtual particle is one defined as existing'off mass-shell'.^(3){ }^{3} Interactions are described by field theories.The pattern we've seen in classical field theories(such as electromagnetism and gravitation)is (i)that sources of a field tell the field how to arrange itself;(ii)the field then acts on the sources,via virtual particles,telling them how to move. In quantum electrodynamics(or QED,which is the quantum upgrade of classical electromagnetism),the sources of the fields are electric charges and currents,arranged into a current 4 -vector J\boldsymbol{J} .The force-carrying particles are photons,which are the massless particle excitations of the electromagnetic field tilde(A)(x)\tilde{\boldsymbol{A}}(x) .One explanation for the photon having zero mass is that a massless virtual particle is the only way to produce the Coulomb interaction potential V(r)prop1//rV(r) \propto 1 / r .A massive force-carrying particle would lead to a potential that falls off more rapidly with dis- tance,as examined in the next example.
Example 47.1
The role of mass can be understood by considering the mathematical form of the interaction mediated by Yukawa's force-carrying particles.The Yukawa potential is written as U( vec(r))prop-(e^(-alpha m| vec(r)|))/(4pi|( vec(r))|)U(\vec{r}) \propto-\frac{\mathrm{e}^{-\alpha m|\vec{r}|}}{4 \pi|\vec{r}|} ,where mm is the mass of a force-carrying particle and alpha\alpha is a parameter.This potential obeys the Green's function ^(4){ }^{4} equation
This equation is a form of the Klein-Gordon equation(see Chapter 40),which is the equation of motion for massive,spinless (S=0)(S=0) particles.The effective potential representing the interaction mediated by such particles must also obey this equation of motion.
What is the analogous expression for electromagnetism?We know that the Coulomb potential V( vec(r))prop1//| vec(r)|V(\vec{r}) \propto 1 /|\vec{r}| must obey Poisson's equation,which is the name we give the Green's function equation
We see that everything is consistent if we set m=0m=0 for the case of electromagnetism (corresponding to a massless photon).Conversely,if m!=0m \neq 0 ,then the electromagnetic interaction would have an exponential contribution e^(-alpha m| vec(r)|)\mathrm{e}^{-\alpha m|\vec{r}|} ,which it does not.
A massless photon necessarily has only two polarization states,associ- ated with the two polarizations of light.The photon also has a spin ^(5){ }^{5}S=1S=1 .We would now like to identify the analogous properties of the graviton,the particle excitation of the metric field of gravity that mediates the gravitational force.The gravitational potential varies as Phi(r)prop1//r\Phi(r) \propto 1 / r and so this constrains the mass of the virtual particle to be m=0m=0 .What is the spin of the graviton?A spin S=1S=1 theory has the property that like charges repel and unlike charges attract.^(6){ }^{6} This is incompatible with gravity,which is purely attractive.Even-integer spin exchange leads exclusively to forces of one sign(either purely attractive ^(3){ }^{3} See Chapter 28.We saw that the ar- gument is that quantum mechanics al- lows us to violate this classical disper- sion relation,as long as we don't do it for too long.By invoking energy-time uncertainty Delta E Delta t∼ℏ\Delta E \Delta t \sim \hbar ,we can say that particles of energy EE are allowed to ex- ist off the mass-shell as long as they live for a short time Delta t≲ℏ//E\Delta t \lesssim \hbar / E .Virtual particles,therefore,must have a finite range since they can't live forever and they travel at finite velocity. ^(4){ }^{4} A Green's function GG is the solution of a differential equation of the form hat(L)G=delta\hat{L} G=\delta ,where hat(L)\hat{L} is a linear operator and delta\delta is a delta function. ^(5){ }^{5} We summarize here the different spins and tensors associated with different field theories:
^(6){ }^{6} Or vice versa,if the coupling constant has opposite sign. ^(7){ }^{7} See Exercise 42.13 .
Fig. 47.1 The exchange of a virtual photon causes two currents to interact. ^(8){ }^{8} Owing to the symmetry of the interac tion, which is reflected in the diagram in Fig. 47.1, each particle actually plays both the role of the scattered particle and that of the source of the scattering potential. However, the argument presented here will get us to the right answer.
Fig. 47.2 One current acting as the source of the field with which the other interacts. ^(9){ }^{9} We can see the motivation for this by combining eqn 47.3 with the interaction Lagrangian L=A_(mu)(x)J^(mu)(x)\mathcal{L}=A_{\mu}(x) J^{\mu}(x), which leads us to predict an interaction with the form J_(a)(1)/(k^(2))J_(b)\boldsymbol{J}_{a} \frac{1}{\boldsymbol{k}^{2}} \boldsymbol{J}_{b}. This is indeed the case.
or purely repulsive), so the graviton must be one of S=0,2,4,dotsS=0,2,4, \ldots A scalar field phi(x)\phi(x) has S=0S=0. However, an S=0S=0 theory turns out to be too simple to capture gravitation ^(7){ }^{7} (it makes incorrect predictions for the energetics of gravitation for one thing). We shall therefore assume that the graviton is a S=2S=2 particle and, as a result, must be described by a symmetric tensor field. In order to extract some properties of the graviton, we first take a side step and examine the properties of the photon within quantum field theory. We shall then discuss gravitation by making exactly the same mathematical steps, substituting gravitation for electromagnetism. This will reveal the form of the graviton interaction and show the role of graviton polarization.
47.2 Photon propagation and polarization
We know that two electrons have interacted if, on approaching each other, their motion is altered by each other's presence. This is a rough description of scattering, where incoming particles change their momentum states owing to interactions. The probability of scattering is encoded in a quantum mechanical amplitude, and it is this that we shall compute in this section. A useful method of understanding and computing these amplitudes makes use of Feynman diagrams. These are momentum-space cartoons of the scattering processes that encode the equations involved in a perturbation expansion of the underlying quantum field theory. The simplest Feynman diagram for electromagnetic electron-electron interactions is shown in Fig. 47.1. Conceptually, the diagram can be understood in terms of the current representing the motion of one electron (called the bb-particle for the sake of argument) being the source of the field tilde(A)(x)\tilde{\boldsymbol{A}}(x) with which the current J_(a)(x)J_{a}(x) representing the other electron (the aa-particle) interacts, as shown in Fig. 47.2. ^(8){ }^{8}
In general, an electromagnetic field tilde(A)\tilde{\boldsymbol{A}} interacts with a current at a point xx via a term in the Lagrangian L=J^(mu)(x)A_(mu)(x)\mathcal{L}=J^{\mu}(x) A_{\mu}(x). As we said above, the source of the electromagnetic field in this case is the current of the bb-particle with components (J_(b))^(mu)\left(\boldsymbol{J}_{b}\right)^{\mu}. The resulting tilde(A)\tilde{\boldsymbol{A}} field has components that can be described (after a suitable choice of gauge) by the equation of motion del^(2)A^(mu)=-(J_(b))^(mu)\partial^{2} A^{\mu}=-\left(J_{b}\right)^{\mu} or, equivalently, the momentum-space equation
We can understand the interaction of this field with the other current by computing a scattering amplitude A\mathcal{A} for the current of bb-electrons (J_(b))\left(\boldsymbol{J}_{b}\right) to interact with a current of aa-electrons (J_(a))\left(\boldsymbol{J}_{a}\right). This has the component form ^(9){ }^{9}
{:(47.4)iA=(" Current ")_(a)^(mu)((" Virtual-particle ")/(" propagator "))_(mu nu)(" Current ")_(b)^(nu).:}\begin{equation*}
\mathrm{i} \mathcal{A}=(\text { Current })_{a}^{\mu}\binom{\text { Virtual-particle }}{\text { propagator }}_{\mu \nu}(\text { Current })_{b}^{\nu} . \tag{47.4}
\end{equation*}
The part in the middle, called the propagator tells us the probability amplitude for a virtual, force-carrying particle to interact with current aa and current bb. This is the process shown in Fig. 47.1. The solid lines represent the currents; the wiggly line represents the photon propagator.
Working in momentum space, the flat-space photon propagator giving the amplitude describing a photon with wavevector k\boldsymbol{k} is given by a tensor DD with components ^(10){ }^{10}
where k\boldsymbol{k} is the four-momentum of the photon. (The is factor deals with the fact that k^(0)!=| vec(k)|k^{0} \neq|\vec{k}| for a virtual particle, but won't be important to us here and so will be dropped.) We therefore consider the amplitude
We can immediately reduce the number of components using the momentum-space representation of the conservation of current grad*J=0\boldsymbol{\nabla} \cdot \boldsymbol{J}=0, which implies that k_(mu)J^(mu)=0k_{\mu} J^{\mu}=0. We then have -k_(0)J^(0)+k_(3)J^(3)=0-k_{0} J^{0}+k_{3} J^{3}=0, which allows us to eliminate the component J^(3)J^{3} to yield
Now for some interpretation. The first term is written in terms of J^(0)=rhoJ^{0}=\rho, the electromagnetic charge density. If we (inverse) Fourier transform this quantity we obtain an instantaneously acting Coulomb potential, which is repulsive between like charges
This only looks (unphysically) instantaneous because we've split up the propagator in a non-covariant manner. Moreover, this is the Coulomb term that dominates in the non-relativistic regime.
The part left over is retarded. That is, it depends on the finite time taken for the photon to propagate. For our case of photons propagating along the zz - (or 3 -) direction, we look at the amplitude of the second term and we see that there seem to be two sorts of photon: those that couple J^(1)J^{1} currents and those that couple J^(2)J^{2} currents. These are the two physical transverse photon polarizations. To see this, we decompose the term (J_(a)^(1)J_(b)^(1)+J_(a)^(2)J_(b)^(2))\left(J_{a}^{1} J_{b}^{1}+J_{a}^{2} J_{b}^{2}\right) into a different basis by writing
This implies that two sorts of photons interact: the J+iJJ+\mathrm{i} J sort and the J-iJJ-\mathrm{i} J sort. These are indeed the two possible polarizations for the photons, although they are circularly polarized here (see below), compared to the linearly polarized states discussed in the last chapter. ^(10){ }^{10} The details of this equation won't be important to us, apart from the 1//k^(2)1 / \boldsymbol{k}^{2} part.
If, in the last example, we use coordinates J_(a)^(1)=j cos phiJ_{a}^{1}=j \cos \phi and J_(a)^(2)=j sin phiJ_{a}^{2}=j \sin \phi, we see that the two polarization can be represented as je^(iphi)j \mathrm{e}^{\mathrm{i} \phi} and je^(-iphi)j \mathrm{e}^{-\mathrm{i} \phi}. We conclude that these are circularly polarized photons and, using the angular momentum operator hat(L)_(z)=idel//del phi\hat{L}_{z}=\mathrm{i} \partial / \partial \phi, they have spin +-1\pm 1 respectively.
This concludes a round-up of the properties of photons. Next, we turn to gravitons.
47.3 Graviton propagation and polarization
We examine the case of the graviton by following exactly the steps that we followed for the photon. By analogy with the electromagnetic case, we have that the gravitational field h(x)\boldsymbol{h}(x), which exists by virtue of a mass distribution being present, has components
The interaction of this field with another mass distribution can be examined by considering two distributions of mass-energy described by energy-momentum tensors T_(a)\boldsymbol{T}_{a} and T_(b)\boldsymbol{T}_{b} interacting via a propagator that reflects a massless gravity-carrying particle. Since T\boldsymbol{T} is a second-rank tensor, we need to deal with the extra indices, so the analogous scattering amplitude is given by
In the same way that the field A(x)\boldsymbol{A}(x) interacts with current J\boldsymbol{J} via an interaction term L=A_(mu)J^(mu)\mathcal{L}=A_{\mu} J^{\mu}, we expect the interaction of the gravitational field h\boldsymbol{h} and the energy-momentum T\boldsymbol{T} to take the form
where h_(mu nu)h_{\mu \nu} are the components of the weak-field tensor h(x)\boldsymbol{h}(x).
Newton's law can be expressed in terms of the instantaneous part of this interaction. Since we know that T^(00)T^{00} represents rho\rho, the mass density, then we expect the part reflecting Newton's law to be
where the sign ensures gravity is attractive. This equation is the (inverse) Fourier transform of the Newtonian potential energy. The retarded term then gives us the information on the graviton polarizations. We therefore expand out the amplitude and find
One complication in dealing with a second-rank tensor T\boldsymbol{T} is that it carries around an invariant (i.e. scalar and therefore S=0S=0 ) part: its trace TT. The consequence of this is that when we split up the amplitude A\mathcal{A} in terms of the components of T\boldsymbol{T}, it can erroneously appear that the graviton has three polarizations, rather than the two it must have. The remedy is to use this trace part by adding to our amplitude A=\mathcal{A}=T_(mu nu)^(')(1//k^(2))T^(mu nu)T_{\mu \nu}^{\prime}\left(1 / \boldsymbol{k}^{2}\right) T^{\mu \nu} a multiple of the trace-part
{:(47.16)alphaT^(')((1)/(k^(2)))T:}\begin{equation*}
\alpha T^{\prime}\left(\frac{1}{\boldsymbol{k}^{2}}\right) T \tag{47.16}
\end{equation*}
where TT and T^(')T^{\prime} denote traces. Here alpha\alpha is a constant that we are free to choose in order that we cancel off any illusory S=0S=0 part and leave only S=2S=2 gravitons. Let's set about decomposing the amplitude.
Example 47.3
As in the photon case, we use the momentum-space version of conservation of massenergy, grad*T=0\boldsymbol{\nabla} \cdot \boldsymbol{T}=0, to write
which gives us
Now we choose alpha\alpha so that there are only two terms in the S=2S=2 retarded part (reflecting the two polarizations). This is achieved by setting alpha=1//2\alpha=1 / 2 and so we obtain
Since we can use the symmetry of T\boldsymbol{T} to rewrite 2T_(b)^(12)=(T_(b)^(12)+T_(b)^(21))2 T_{b}^{12}=\left(T_{b}^{12}+T_{b}^{21}\right), we can write the retarded part of the amplitude as
We conclude that there are two graviton polarizations: (T_(b)^(11)-T_(b)^(22))\left(T_{b}^{11}-T_{b}^{22}\right) and (T_(b)^(12)+T_(b)^(21))\left(T_{b}^{12}+T_{b}^{21}\right).
The field required to generate these gravitons takes the form
^(11){ }^{11} This is examined further in the exercises. ^(12){ }^{12} The problem here is related to renormalization. Calculations of renormalization. theory in QFT often lead to infinities. theory in QFT often lead to infinities.
Unlike the case in some other QFTs, Unlike the case in some other QFTs, the infinities in gravitational perturba-
tion theory, encountered at higher ortion theory, encountered at higher orders of scattering, cannot be removed by the set of techniques, known as renormalization, that have proved successful in removing infinities in theories like quantum electrodynamics or quantum chromodynamics. For example, if we compute the amplitude for gravitons to scatter from gravitons at energy EE, perturbation theory predicts the amplitude is given by a series of the form G[1+GE^(2)+(GE^(2))^(2)+dots]G\left[1+G E^{2}+\left(G E^{2}\right)^{2}+\ldots\right], where GG is the gravitational constant. Once the energy scale EE reaches G^(-(1)/(2))G^{-\frac{1}{2}} this approach fails as the series diverges. (By analogy with the closely related Fermi theory of the weak interaction we might expect some new physics to appear at this energy scale.) The infinities encountered in this approach are avoided to an extent in string theory, which follows a similar route to QFT and is described in Chapter 49. See Zee's Quantum Field Theory in a Nut shell (2003) for a discussion of the analogy between graviton-graviton scattering and the Fermi theory.
How do we know the expressions identified in the last example are the correct polarizations for an S=2S=2 field? For circularly polarized gravitons, we must be able to shift to a coordinate system where the phases behaves as e^(2i theta)\mathrm{e}^{2 i \theta} and e^(-2i theta)\mathrm{e}^{-2 i \theta}. As can be checked, it is possible to rewrite the polarization part of the retarded term as
Each bracket has the form (xx-yy+-2ixy)(x x-y y \pm 2 \mathrm{i} x y), which is equivalent to (x+-iy)(x+-iy)(x \pm \mathrm{i} y)(x \pm \mathrm{i} y). Since each of the bracketed terms in this last expression can be represented as a phase e^(+-i theta)\mathrm{e}^{ \pm i \theta}, their product has the required e^(+-2i theta)\mathrm{e}^{ \pm 2 i \theta} phase. ^(11){ }^{11}
Example 47.4
From the last example we see that the amplitude can be written as
We can then spot two things. This first is that the amplitude A\mathcal{A} can be rewritten in terms of the graviton propagator as A=T^('sigma tau)D_(sigma tau mu nu)T^(mu nu)\mathcal{A}=T^{\prime \sigma \tau} D_{\sigma \tau \mu \nu} T^{\mu \nu} from which we find an expression for the propagator of
Since the amplitude A\mathcal{A} is also proportional to h_(mu nu)T_(a)^(mu nu)h_{\mu \nu} T_{a}^{\mu \nu}, we conclude that amplitude of gravitons emitted from a source can be written in terms of a field as
The part in brackets is, of course, familiar from the Einstein equation, so it is heartening to see it appear from this completely different approach. In fact, this equation is recognizable as the momentum-space version of the weak-field Einstein equation.
Although we might feel pleased with the progress made in this chapter, it's sobering to remember that nobody has yet quantized gravity consistently. The approach suggested here, which has perturbation theory at its root, has been attempted several times, but does not lead to a consistent theory. ^(12){ }^{12} A successful quantum field theory of gravity might still be expected to result in a prediction of the graviton excitations with properties something like those that we have discussed here. (However, there is nothing to guarantee this.) In Chapter 49, we shall look at some possible avenues for this project of finding quantum gravity. In order to get there, we shall need to make some more room in spacetime by considering the possibility of spacetime with more than (3+1) dimensions, which is our next subject.
Chapter summary
The graviton is a force-carrying particle with spin S=2S=2 and two polarizations.
The graviton has spin S=2S=2 because its source is a second-rank tensor T^(mu nu)T^{\mu \nu}; the photon has spin S=1S=1 because its source is a first-rank tensor J^(mu)J^{\mu}.
In scattering theory, the gravitational interaction between masses can be written in terms of an amplitude as
The perturbative scattering approach fails to describe gravitation at higher orders of perturbation theory.
Exercises
(47.1) Verify that eqn 47.26 is equivalent to the retarded term in the graviton amplitude.
(47.2) The polarization vectors of a spin-1 particle change according to (epsilon_(mu)^('))^(i)=R^(i)_(j)(theta)(epsilon_(mu))^(j)\left(\boldsymbol{\epsilon}_{\mu}^{\prime}\right)^{i}=R^{i}{ }_{j}(\theta)\left(\boldsymbol{\epsilon}_{\mu}\right)^{j}, where the rotation matrix is given by
Find combinations of the linear polarization vectors that obey the transformation law
{:(47.32)R_(j)^(i)(theta)(epsilon_(h))^(j)=e^(ih theta)(epsilon_(h)^('))^(i):}\begin{equation*}
R_{j}^{i}(\theta)\left(\boldsymbol{\epsilon}_{h}\right)^{j}=\mathrm{e}^{\mathrm{i} h \theta}\left(\boldsymbol{\epsilon}_{h}^{\prime}\right)^{i} \tag{47.32}
\end{equation*}
for helicities h=1,0h=1,0 and -1 .
For photons, only the h=1h=1 and h=-1h=-1 helicities are found in Nature.
(47.3) For gravitons, polarizations are expressed as (0,2)(0,2) tensors epsilon\boldsymbol{\epsilon} with components epsilon_(ij)\epsilon_{i j} that transform as
where the rotation matrix is the same as in the last question. (Note the slightly awkward ordering of the components here. We take R^(i)_(j)=R_(i)^(j)R^{i}{ }_{j}=R_{i}{ }^{j}
in any case.) For a graviton travelling along zz, we found in the previous chapter that the only nonzero components are epsilon_(11)=-epsilon_(22)\epsilon_{11}=-\epsilon_{22} and epsilon_(12)=epsilon_(21)\epsilon_{12}=\epsilon_{21}.
(a) Show that under the transformation we obtain
(b) Find linear combinations of the polarization tensors that yield the two polarization states of the graviton with h=+-2h= \pm 2.
(47.4) (a) Show that the gravitational wave power LL from a binary system of two masses, m_(1)m_{1} and m_(2)m_{2}, separated by distance aa, and in a circular orbit about their centre of mass, is given by
Estimate this quantity for the Earth-Sun system. Also, estimate the number of gravitons emitted per second.
(b) Estimate how long it would take you to emit a single graviton by frantically waving your arms around in the air.
48
Higher dimensional spacetime
48.1
Gauge transformations five dimensions
in 521
48.2
Unifying electromagnetism and gravitation
Chap
ter summary
525
Exer
cises
525
48.1 Gauge transformations five dimensions in 521
48.2 Unifying electromagnetism and gravitation
Chap ter summary 525
Exer cises 525| 48.1 | Gauge transformations five dimensions | in 521 |
| :---: | :---: | :---: |
| 48.2 | Unifying electromagnetism and gravitation | |
| | | |
| Chap | ter summary | 525 |
| Exer | cises | 525 |
Exercises ^(1){ }^{1} Theodor Kaluza (1885-1954). It is said that he taught himself to swim by reading a book, resulting in him successfully swimming on his first attempt. ^(2){ }^{2} We follow the approach of Zee in this chapter. Kaluza's theory was rediscovered and developed by Oskar Klein (1894-1977) in 1926, who provided a quantum mechanical description of the theory. Gunnar Nordström had also independently developed a related theory before Kaluza. ^(3){ }^{3} Max Planck (1858-1947) was an early champion of special relativity, extending the theory by formulating the relativistic action. Planck and Einstein were close friends who would meet to play music together. In his biography of Einstein, Abraham Pais notes phy of Einstein, Abraham Pais notes
Einstein's profound respect for Planck, Einstein's profound respect for Planck,
both as a scientist and as a deeply prinboth as a scientist and
cipled human being. ^(4){ }^{4} Remember the conceptual equation
The idea that this can be achieved through a five dimensional cylinder-world has never occurred to me and would seem to be altogether new. I like your idea at first sight very much. Albert Einstein, letter to Theodor Kaluza (1919)
General relativity presents us with a classical field theory of gravitation expressed using the tools of geometry. In Chapter 42, we met the classical field theory of electromagnetism expressed in similar geometric language. It's natural to ask whether gravitation and electromagnetism can be combined in such a way that they naturally arise as different facets of some master theory. This is the project of unification which, in a broader, modern sense, involves a combination of gravity and the standard model of particle physics. In this chapter, we examine an attempt, originally made by Theodor Kaluza ^(1){ }^{1} in 1919, to use the gauge structure of electromagnetism and gravity to combine these interactions. The solution, known as Kaluza-Klein theory, ^(2){ }^{2} involves adding an extra spatial dimension to spacetime.
Since, in this chapter, we shall be comparing theories in different numbers of dimensions, it will be helpful to make the action of our gravitation theory dimensionless. We do this by employing some dimensional analysis. We use units where c=1c=1 and also where Planck's constant ^(3)ℏ=1{ }^{3} \hbar=1. In such units, a mass has units of 1//1 / (length) or 1//L1 / L. The EinsteinHilbert action was previously written as
{:(48.1)S_(EH)=intd^(4)xsqrt(-g)R(g)",":}\begin{equation*}
S_{\mathrm{EH}}=\int \mathrm{d}^{4} x \sqrt{-g} R(\boldsymbol{g}), \tag{48.1}
\end{equation*}
where we write the Ricci scalar as R(g)R(\boldsymbol{g}) to remind us that it is derived from the first and second derivatives of the components of the metric tensor. ^(4){ }^{4} The components of the metric tensor g\boldsymbol{g} are dimensionless. The Ricci scalar RR involves two derivatives of the metric and therefore carries units of 1//L^(2)1 / L^{2}. As a result, S_(EH)S_{E H} with its four contributions of length from d^(4)x\mathrm{d}^{4} x, has units L^(2)L^{2}. In order to make it dimensionless, we multiply by two powers of mass m_(P)m_{\mathrm{P}} with the result that
{:(48.2)S_(EH)=intd^(4)xsqrt(-g)m_(P)^(2)R(g).:}\begin{equation*}
S_{\mathrm{EH}}=\int \mathrm{d}^{4} x \sqrt{-g} m_{\mathrm{P}}^{2} R(\boldsymbol{g}) . \tag{48.2}
\end{equation*}
The mass we choose sets the scale of gravitational interactions and is known as the Planck mass. We shall discuss this quantity further in Chapter 49.
48.1 Gauge transformations in five dimensions
Kaluza's scheme for unifying electromagnetism and gravity can be understood by (once again) comparing the structure of the gauge transformations in electromagnetism and in gravitation. We saw in Chapter 42 that the gauge transformation in electromagnetism ^(5){ }^{5} can be written in terms of the components of the electromagnetic 1-form tilde(A)\tilde{\boldsymbol{A}} as
This transformation has no effect on the Faraday 2 -form tilde(F)=d tilde(A)\tilde{\boldsymbol{F}}=\boldsymbol{d} \tilde{\boldsymbol{A}} and hence on the underlying equations of motion of the electromagnetic fields.
From the point of view of Chapter 44, the field tilde(A)\tilde{\boldsymbol{A}} was mandated by changes made in the internal phase variable theta(P)rarr theta(P)+alpha(P)\theta(\mathcal{P}) \rightarrow \theta(\mathcal{P})+\alpha(\mathcal{P}), where P\mathcal{P} labels a point in space. This variable has its information stored in a bundle of fibres, one at each point P\mathcal{P}, floating above the spacetime. In the weak-field limit of gravitation, we recall that the gauge structure is derived by considering invariance under a set of infinitesimal coordinate transformations x^(mu)(P)rarrx^(mu)(P)+xi^(mu)(P)x^{\mu}(\mathcal{P}) \rightarrow x^{\mu}(\mathcal{P})+\xi^{\mu}(\mathcal{P}), which causes the components of the tensor h\boldsymbol{h} to change according to
The key to unification is to treat the internal variable theta\theta as describing a coordinate in spacetime. That is, we regard a coordinate that was previously stored in a fibre as now describing a position in spacetime. This allows us to combine the electromagnetic and gravitational gauge transformations in a manner where they all derive from a single set of infinitesimal coordinate transformations, and reveals the structure needed to unify the two interactions.
Kaluza's inspired idea was therefore to add to our four-dimensional coordinates x^(mu)=(x^(0),x^(1),x^(2),x^(3))x^{\mu}=\left(x^{0}, x^{1}, x^{2}, x^{3}\right) a fifth coordinate x^(5)x^{5} to form the fivedimensional coordinate set ^(6)X^(a)=(x^(0),x^(1),x^(2),x^(3),x^(5)){ }^{6} X^{a}=\left(x^{0}, x^{1}, x^{2}, x^{3}, x^{5}\right). We now demand invariance under the infinitesimal coordinate transformation
where the five-dimensional version of the Minkowski metric has components eta_(ab)=diag(-1,1,1,1,1)\eta_{a b}=\operatorname{diag}(-1,1,1,1,1). We have again the gauge transformation property of the metric components that h_(ab)rarrh_(ab)-xi_(a,b)-xi_(b,a)h_{a b} \rightarrow h_{a b}-\xi_{a, b}-\xi_{b, a}.
We now use the fifth dimension to accommodate the electromagnetic gauge freedom. To see this set the index b=5b=5 and we have
^(5){ }^{5} Remember that this arose from demanding local phase invariance for the matter fields that fill spacetime. ^(6){ }^{6} In what follows we always let mu\mu run over values 0-30-3 and we let a=0,1,2,3a=0,1,2,3 and 5. The reason for the introduction of x^(5)x^{5}, rather than the more logical x^(4)x^{4}, is historical: it was conventional to call the timelike component x^(4)x^{4} in the older literature, instead of the more modern choice of x^(0)x^{0}. ^(7){ }^{7} Remembering that Greek indices like mu\mu run over 0-30-3 only ^(8){ }^{8} In these matrix equations, the usua (3+1)(3+1) dimensions described by Greek indices live in the top left block M_(mu nu)M_{\mu \nu}, with the new, fifth dimension in the bottom right component M_(55)M_{55}. The off-diagonal components M_(mu5)M_{\mu 5} and M_(5mu)M_{5 \mu} mix (3+1)(3+1) dimensions and the fifth dimension.
We then (i) set h_(mu5)=ℓA_(mu)h_{\mu 5}=\ell A_{\mu}, where ℓ\ell is an arbitrary length; and (ii) assume xi_(mu)\xi_{\mu} is independent of x^(5)x^{5}. The result is that eqn 48.7 becomes
which is identical to the electromagnetic gauge transformation in eqn 48.3 if we set chi=xi_(5)//ℓ\chi=\xi_{5} / \ell. In fact, if we take ^(7)xi_(mu)=0{ }^{7} \xi_{\mu}=0 then the electromagnetic gauge transformation becomes
In summary, the electromagnetic gauge transformation has been absorbed into the infinitesimal coordinate transformation. Specifically, we recall that the 1-form tilde(A)=A_(mu)dx^(mu)\tilde{\boldsymbol{A}}=A_{\mu} \boldsymbol{d} x^{\mu} transforms according to
Since the coordinate x^(5)x^{5} transforms according to x^(5)rarrx^(5)+ℓchix^{5} \rightarrow x^{5}+\ell \chi then we have dx^(5)rarr dx^(5)+ℓd chi\boldsymbol{d} x^{5} \rightarrow \boldsymbol{d} x^{5}+\ell \boldsymbol{d} \chi, and we can spot that the combination (dx^(5)+ℓ( tilde(A)))\left(\boldsymbol{d} x^{5}+\ell \tilde{\boldsymbol{A}}\right) is gauge invariant. This quantity then, linking a spatial coordinate and the electromagnetic field, is key to unification.
From the point of view of the gauge structure, the extra dimension can be used to bring the electromagnetic gauge transformation down from the fibre bundle and into the heart of spacetime itself. Motivated by this, we shall see in the next section how the extra structure can be used to build a metric that incorporates electromagnetism and gravitation.
48.2 Unifying electromagnetism and gravitation
To simplify our notation a little, let's call the x^(5)x^{5} coordinate zz. We write the action for the enlarged spacetime as
{:(48.11)S=intd^(4)xdzsqrt(-H)m_(K)^(3)R(H):}\begin{equation*}
S=\int \mathrm{d}^{4} x \mathrm{~d} z \sqrt{-H} m_{\mathrm{K}}^{3} R(\boldsymbol{H}) \tag{48.11}
\end{equation*}
where an extra mass factor m_(K)m_{\mathrm{K}} has been included since there are now 5 powers of length in the terms d^(4)xdz\mathrm{d}^{4} x \mathrm{~d} z. The mass m_(K)m_{\mathrm{K}} sets the scale for five-dimensional gravity, just as the Planck mass m_(P)m_{\mathrm{P}} did in four dimensions. The form of the metric is strongly constrained if we stipulate that it must be gauge invariant. Since the gauge transformation changes z rarr z+ℓchiz \rightarrow z+\ell \chi there is only one way a gauge invariant metric line element can be constructed. The line element must be
This gives us contributions to the action from gravitation and from electromagnetism,
{:(48.15)S=intd^(4)xdzsqrt(-H)(R(g)-(1)/(4)F^(mu nu)F_(mu nu))",":}\begin{equation*}
S=\int \mathrm{d}^{4} x \mathrm{~d} z \sqrt{-H}\left(R(\boldsymbol{g})-\frac{1}{4} F^{\mu \nu} F_{\mu \nu}\right), \tag{48.15}
\end{equation*}
where H=detH_(ab)H=\operatorname{det} H_{a b} and with the electromagnetic part having the Lagrangian L=\mathcal{L}=-(1)/(4)F^(mu)F_(mu nu)-\frac{1}{4} F^{\mu} F_{\mu \nu} that we met in Chapter 42.
We have seen that for the cost of adding an extra dimension to space, gravitation, and electromagnetism can be combined. But if this is a description of reality, where is this extra dimension and, since we don't appear to have detected it in our measurements, how can it be explored?
Taking our lead from the structure of our fibres in Chapter 44, we propose that the extra dimension is hidden from us by virtue of its being wound, or compactified, into a very small circle of radius aa. As a result, z-=x^(5)z \equiv x^{5} varies in the range 0 <= x^(5) <= 2pi a0 \leq x^{5} \leq 2 \pi a. This curvature of space is permitted by general relativity and, since in our units energy has units 1//L1 / L, we see that potentially enormous energies would be needed to experimentally resolve the dimension if it is small enough. ^(10){ }^{10} Put another way, in order to escape into this dimension, a particle would need (a huge) momentum of order p~~1//ap \approx 1 / a. We therefore have the picture of spacetime in Fig. 48.1. It has, at each point x^(mu)x^{\mu} in its (3+1)-dimensional subspace, a dimension resembling a tiny, circular knob. The electromagnetic gauge transformation x^(5)rarrx^(5)+ℓchi(x^(mu))x^{5} \rightarrow x^{5}+\ell \chi\left(x^{\mu}\right) corresponds to a rotation of the knobs by different amounts at each point.
Example 48.2
Using this idea we can link the interaction scale m_(K)m_{\mathrm{K}} to the Planck mass m_(P)m_{\mathrm{P}}. In the absence of electromagnetic field, we have H_(mu nu)=g_(mu nu),H_(mu5)=0H_{\mu \nu}=g_{\mu \nu}, H_{\mu 5}=0 and H_(55)=1H_{55}=1. The action then becomes
S=2pi am_(K)^(3)intd^(4)xsqrt(-g)R(g)S=2 \pi a m_{\mathrm{K}}^{3} \int \mathrm{~d}^{4} x \sqrt{-g} R(\boldsymbol{g})
and so we recognize from eqn 48.2 that m_(P)^(2)=2pi am_(K)^(3)m_{\mathrm{P}}^{2}=2 \pi a m_{\mathrm{K}}^{3}.
We can use the Kaluza-Klein metric H\boldsymbol{H} to work out the equations of motion for a particle in flat (3+1)-dimensional spacetime.
Example 48.3
We start from the action for a particle, ^(11){ }^{11} including the coordinate zz, written as
^(9){ }^{9} See the exercises and also the book by Zee (2013) for the details of how this is done. ^(10){ }^{10} This theme of hidden, compactified dimensions is one we pick up in Chapter 49.
Fig. 48.1 Spacetime in the KaluzaKlein theory. At every point in threedimensional space an extra dimension can be found, wound up into a small circle of radius aa. ^(11){ }^{11} This is simply an extension of the usual action for a particle in the usual action for a particle in
an electromagnetic field of S=S= an electromagnetic field of S=S= -m intsqrt(-eta_(mu nu)dx^(mu)dx^(nu))+q intA_(sigma)dx^(sigma)-m \int \sqrt{-\eta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}}+q \int A_{\sigma} \mathrm{d} x^{\sigma} from Chapter 42 . ^(12){ }^{12} These are F_(mu nu)=A_(nu,mu)-A_(mu,nu)F_{\mu \nu}=A_{\nu, \mu}-A_{\mu, \nu}. ^(13){ }^{13} In the last chapter, for example, we predicted with quantum field theory predicted with quantum field theory
(QFT) that the amplitude for gravitongraviton scattering varies with energy EE as G(1+GE^(2)+dots)G\left(1+G E^{2}+\ldots\right), where GG is the gravitational constant. Effective field theory says that such an approach is permissible as long as we confine ourselves to the realm of applicability of the theory. In this case, this is E≪E \llG^(-(1)/(2))G^{-\frac{1}{2}}. As discussed in the next chapter, this limiting energy scale is that of the Planck mass m_(P)m_{\mathrm{P}}.
14 Effective theories are described in more detail in Zee. This point of view has been influential in Condensed Matter and in Particle Physics, where it follows from analysis using renormalization group (RG) techniques. See our Quantum Field Theory for the Gifted Amateur (2014) for a description of RG.
Using the Euler-Lagrange equations, we end up with two equations of motion. The first says that the momentum p^(z)p^{z} in the zz-direction is constant and given by
which becomes, on collecting the components of the Faraday tensor ^(12)F{ }^{12} \boldsymbol{F} and doing some simplifying,
{:(48.20)m(d^(2)x^(mu))/(dtau^(2))=(p^(z)ℓ)F_(nu)^(mu)u^(nu):}\begin{equation*}
m \frac{\mathrm{~d}^{2} x^{\mu}}{\mathrm{d} \tau^{2}}=\left(p^{z} \ell\right) F_{\nu}^{\mu} u^{\nu} \tag{48.20}
\end{equation*}
where u^(nu)u^{\nu} are components of the particle's velocity u\boldsymbol{u}. Comparing with what we had in Chapter 42, we see that the electromagnetic charge is given by q=p^(z)ℓq=p^{z} \ell. In words: the momentum along the zz-direction tells us qq, the strength of the interaction between the particle and the electromagnetic field tilde(A)\tilde{\boldsymbol{A}}. Finally, since the wavefunction for a particle confined to a circle has the form psi(z)prope^(ipp^(2)z)\psi(z) \propto \mathrm{e}^{\mathrm{ip} p^{2} z} where, in order for the wavefunction to be single valued, we require a quantized (p^(z))_(n)=2pi n//2pi a=n//a\left(p^{z}\right)_{n}=2 \pi n / 2 \pi a=n / a, where nn is an integer. This implies that electric charge in this picture must be where nn is an integer
quantized in units of
that is, the ratio of length ℓ\ell and the radius of the extra dimension aa.
The unification that Kaluza-Klein theory achieves is very interesting, but ultimately we still lack a quantum theory of gravitation that combines gravity and the standard model of particle physics. The search for one is the subject of our next chapter. Before closing this chapter, we can use the tools we have developed to address a different extension of general relativity: what if, instead of extra dimensions, there are extra interactions?
Example 48.4
The field-theory approach allows us to treat general relativity as an effective theory. The idea here is that our observations are made at low energies and long-length scales (relative to some very small length scale ℓ\ell, for example). ^(13)As{ }^{13} \mathrm{As} a result, the terms in the gravitational action that determine the equations of motion that we can probe in our observations are those where the fields (such as the metric) vary most slowly. It might be that there are really higher order terms in the relativistic action where the fields vary more rapidly, while the ones we have identified represent an effective, lowenergy approximation to a more complete theory of gravitation. ^(14){ }^{14} The higher order terms are those scalars that involve more derivatives of the metric, so will combine more multiples of the objects formed from the components of R\boldsymbol{R}. We can use these to upgrade our Einstein-Hilbert action from S_(EH)=m_(P)^(2)intd^(4)xsqrt(-g)RS_{\mathrm{EH}}=m_{\mathrm{P}}^{2} \int \mathrm{~d}^{4} x \sqrt{-g} R to S_(EH)^(')=m_(P)^(2)intd^(4)xsqrt(-g)[R+ℓ^(2)(alphaR^(2)+betaR_(mu nu)R^(mu nu)+gammaR_(mu nu sigma rho)R^(mu nu sigma rho))+dots]S_{\mathrm{EH}}^{\prime}=m_{\mathrm{P}}^{2} \int \mathrm{~d}^{4} x \sqrt{-g}\left[R+\ell^{2}\left(\alpha R^{2}+\beta R_{\mu \nu} R^{\mu \nu}+\gamma R_{\mu \nu \sigma \rho} R^{\mu \nu \sigma \rho}\right)+\ldots\right], (48.22)
where ℓ\ell is a length and alpha,beta\alpha, \beta and gamma\gamma are constants. The introduction of the length ℓ\ell is required on dimensional grounds, since the terms in the brackets involve two more derivatives of the metric field than RR does. This length allows us to fix the scale at derivatives of the metric field than RR does. This length allows us to fix the scale at
which these extra interactions are important, just as an analogous quantity allowed to pick out the scale at which extra dimensions are important earlier in the chapter.
Fixing a probable value of ℓ\ell will occupy us in the next chapter.
Chapter summary
Kaluza-Klein theory combines electromagnetism and gravitation via their gauge structure by introducing an extra spatial dimension described by a coordinate x^(5)x^{5}.
A metric for the resulting (4+1)(4+1)-dimensional spacetime incorporates the metric for (3+1)(3+1)-dimensional spacetime along with the electromagnetic field tilde(A)(x)\tilde{\boldsymbol{A}}(x)
The extra dimension is wound into a tiny circle. Escaping into this dimension costs enormous amounts of energy
Exercises
(48.1) We shall use the metric in eqn 48.13 to show eqn 48.14, using the method from Chapter 36 We follow the steps in the textbook by Zee, which should be consulted for further discussion.
We absorb the factor of ℓ\ell into A_(mu)A_{\mu} so that the metric is
where Omega^( hat(alpha))_( hat(beta))\boldsymbol{\Omega}^{\hat{\alpha}}{ }_{\hat{\beta}} are the connection 1-forms for the usual four-dimensional metric.
(c) By using Idea 2 from Chapter 36, verify that R^( hat(alpha))_( hat(beta))=dOmega^( hat(alpha))_( hat(beta))-(1)/(2)F^(alpha^(˙))_( hat(beta), hat(gamma))omega^( hat(gamma))^^omega^( hat(5))-(1)/(2)F^( hat(alpha))_( hat(beta))domega^( hat(5))\mathcal{R}^{\hat{\alpha}}{ }_{\hat{\beta}}=\boldsymbol{d} \boldsymbol{\Omega}^{\hat{\alpha}}{ }_{\hat{\beta}}-\frac{1}{2} F^{\dot{\alpha}}{ }_{\hat{\beta}, \hat{\gamma}} \boldsymbol{\omega}^{\hat{\gamma}} \wedge \boldsymbol{\omega}^{\hat{5}}-\frac{1}{2} F^{\hat{\alpha}}{ }_{\hat{\beta}} \boldsymbol{d} \boldsymbol{\omega}^{\hat{5}}
-((1)/(2)F^( hat(alpha))_( hat(gamma))omega^( hat(gamma)))^^((1)/(2)F_( hat(beta) hat(delta))omega^( hat(delta)))-\left(\frac{1}{2} F^{\hat{\alpha}}{ }_{\hat{\gamma}} \boldsymbol{\omega}^{\hat{\gamma}}\right) \wedge\left(\frac{1}{2} F_{\hat{\beta} \hat{\delta}} \boldsymbol{\omega}^{\hat{\delta}}\right).
(d) Since we are only trying to compute the Ricci scalar, terms containing omega^( hat(gamma))^^omega^(5)\boldsymbol{\omega}^{\hat{\gamma}} \wedge \boldsymbol{\omega}^{5} do not contribute. Use this fact to express the useful part of the curvature 2-form as
where tilde(R)^( hat(alpha))_( hat(beta))\tilde{\mathcal{R}}^{\hat{\alpha}}{ }_{\hat{\beta}} is the four-dimensional part.
(e) Use this to show that the components of the Riemann tensor are given by
where tilde(R)^( hat(alpha))_( hat(beta) hat(gamma) hat(delta))\tilde{R}^{\hat{\alpha}}{ }_{\hat{\beta} \hat{\gamma} \hat{\delta}} is again the four-dimensional part.
(f) Turning now to the hat(5)\hat{5}-components, verify that
What are the stars? ... They are bits of fire a few kilometres away. We could reach them if we wanted to. Or we could blot them out. The earth is the centre of the universe. The sun and the stars go round it.George Orwell (1903-1950) Nineteen Eighty-Four
The fundamental interactions in Nature are gravitational, electromagnetic, weak and strong. In the last chapter, we met a scheme aimed at unifying electromagnetism and gravitation into a single master theory, using the concept of a gauge field. It relied on having access to an extra dimension in which to work. The project to combine the fundamental forces of nature into a single theory was one of the greatest triumphs of twentieth century physics. Quantum field theory (QFT) allowed (i) electromagnetism along with (ii) the weak and (iii) the strong interactions to be combined into a quantum gauge field theory known as the Standard Model of particle physics. The fundamental force that resists being incorporated into this scheme on the same basis as the others is gravitation. ^(1){ }^{1} Starting from the attempt of last chapter to accommodate gravitation through extra spatial dimensions, we shall introduce some more recent attempts that try to make sense of the place of gravitation in the quantum world. ^(2){ }^{2}
49.1 Extra dimensions
We saw in the last chapter the power of having extra spatial dimensions available in formulating theories. If these extra dimensions exist, we should ask ourselves why we haven't yet been able to detect them through their quantum-mechanical effects, given the ease with which we have detected the three spatial dimensions and single time dimension of conventional field theory. In the last chapter, we guessed that this was due to the large energies involved in their small scale, an intuition we confirm in the next example. This idea is that, in quantum mechanics, a particle can lower its kinetic energy if it spreads its wavefunction over a larger area. So if there are extra dimensions that a quantum particle can access, this leads to a different structure of energy levels compared to the case where the particle is confined to three spatial dimensions.
49.1 Extra dimensions 527
49.2 String theory 530
49.3 Parametrizing the string 532
49.4 Strings in relativity 534\mathbf{5 3 4}
49.5 Superspace 536
49.6 Loop quantum gravity 537
49.7 Anti-de Sitter spacetime 539 49.8 Our current best guess 542 Chapter summary 545\mathbf{5 4 5} Exercises 545 ^(1){ }^{1} If we accept the effective-field point of view from the last chapter, it is possible to incorporate gravitation into this scheme (even though, unlike the other forces, gravitation isn't renormalizable) as long as we accept it as a low-energy approximation that we expect to break down at sufficiently high energy. ^(2){ }^{2} Before getting into the complicated story presented in this chapter, we note, following an argument by Sidney Coleman, that invoking a little quantum mechanics immediately allows us a quick route to gravitational redshift. In a uniform gravitational field, the freea uniform gravitational field, the free-
particle Hamiltonian H=p^(0)=mH=p^{0}=m picks particle Hamiltonian H=p^(0)=mH=p^{0}=m picks
up an extra potential energy term mghm g h, where V=ghV=g h is equal to the gravitational potential, so we can write the effect of gravity by saying H rarr H+HVH \rightarrow H+H V. The time evolution operator for quantum states, determined by the Hamiltonian, then changes by the presence of the potential according to hat(U)=e^(-i hat(H)t//h)rarre^(-i hat(H)(1+V)t//ℏ)\hat{U}=\mathrm{e}^{-\mathrm{i} \hat{H} t / h} \rightarrow \mathrm{e}^{-\mathrm{i} \hat{H}(1+V) t / \hbar}. (49.1)
Gravitation can thus be included in our equations by making the swap to the time parameter t rarr(1+V)tt \rightarrow(1+V) t. This tells us that clocks run slowest deep down in the potential, where hh and therefore VV are smallest.
Fig. 49.1 (a) Two-dimensional square well. (b) The well compactified in the yy direction.
Example 49.1
Consider a particle in a one-dimensional square well, extended to include an extra dimension. The resulting two-dimensional well is shown in Fig. 49.1(a). It has length LL in the xx direction. We suppose that the reason the extra ( yy ) dimension is not apparent is that it has been compactified, or curled up into a circle. To describe this we can identify (x,y)(x, y) and (x,y+2pi a)(x, y+2 \pi a), and we have the situation shown in Fig. 49.1(b). The space is now a cylinder with circular cross section of circumference 2pi a2 \pi a. The Schrödinger equation for a particle confined in this space is
where a_(n),b_(l)a_{n}, b_{l} and c_(l)c_{l} are set of constants and nn and ll are integer quantum numbers. The resulting energies corresponding to these eigenfunctions are
The quantum number ll can vanish, and so if we set it to zero we recover the usual one-dimensional energy levels. New energy levels only arise for l > 0l>0. If we assume the extra dimensions are curled up into a very small circle, then a≪La \ll L and the second term in eqn 49.4 is large compared to the first. The result is that the extra energy levels lie at E~~ℏ^(2)//2ma^(2)E \approx \hbar^{2} / 2 m a^{2}, which is potentially a very large energy indeed. So, since aa is assumed small, new energy-levels appear only at very large EE.
The last example shows how extra dimensions, if compactified on a very small scale, might not necessarily give rise to measurable effects since they occur at such high energies compared to the capabilities of our measurements.
To get an idea of the length scales involved in these arguments, we can use a set of units originally formulated by Max Planck. René Descartes had originally proposed that there might be an underlying system of units that allowed the geometry of the Universe to be expressed in a natural manner. Around 250 years later, Planck proposed such a set of units, that combined three important constants of Nature, and so link gravity, relativity, and quantum mechanics. The interaction of these different effects should be expected to occur on this Planck scale.
Example 49.2
The idea is to use dimensional analysis to express the gravitational constant GG, the speed of light cc, and the Planck constant ℏ\hbar in terms of a basic length, mass, and time scale ( ℓ_(P),m_(P)\ell_{\mathrm{P}}, m_{\mathrm{P}} and t_(P)t_{\mathrm{P}}, respectively). From the dimensions of these constants we find
The fact that the underlying Planck length scale is so small gives us reason to pause: the effects of quantum mechanics (via expressions involving ℏ\hbar ) and gravity (involving GG ) together are likely only to become clear in processes sharing this underlying scale of units. ^(3){ }^{3}
In natural units, where ℏ=c=1\hbar=c=1, length is inversely proportional to momentum and hence also to energy. Exploring a length scale of 10^(-35)m10^{-35} \mathrm{~m} implies we explore energies ^(4){ }^{4} of 10^(19)GeV10^{19} \mathrm{GeV}, which is well beyond the measurement capability of any existing accelerators. ^(5){ }^{5} As a result, what is going on at the Planck length scale remains mysterious. It might well be, for example, that we have the situation proposed in the previous chapter, where extra dimensions are compactified on length scales of ℓ_(P)\ell_{\mathrm{P}}, and they have hence escaped our notice. So if extra dimensions are at least possible from this point of view, it has been suggested that these might provide the necessary backdrop to formulate a unified theory that includes quantum fields and also gravitation.
If we accept that it might be plausible for extra spatial dimensions to exist, what would be their effect on gravity? In four dimensions [i.e. (3+1)(3+1)-dimensional spacetime], the law of Newtonian gravitation says
{:(49.7)grad^(2)Phi^((4))=4pi G rho:}\begin{equation*}
\nabla^{2} \Phi^{(4)}=4 \pi G \rho \tag{49.7}
\end{equation*}
where Phi^((D))(x)\Phi^{(D)}(x) is the gravitational potential for DD-dimensional spacetime. ^(6){ }^{6} By definition, the mass density is always given by the mass divided by the number of spatial dimensions dd, so if we increase the number of dimensions, the constant of gravitation will have to alter, so that we have
Given the dimensionality of the mass density rho\rho, this latter equation also implies on dimensional grounds that the gravitational force falls off as 1//r^(d-1)1 / r^{d-1}. That is, the force law differs in different numbers of dimensions.
Example 49.3
If there were five dimensions, with one compactified with radius aa, we would have
^(3){ }^{3} It is notable that the Planck mass is much larger than that of any elemental particle and seems the least experimentally inaccessible of the Planck units. The Planck energy scale E_(P)=m_(P)c^(2)E_{\mathrm{P}}=m_{\mathrm{P}} c^{2} that derives from this is 1.22 xx10^(28)eV1.22 \times 10^{28} \mathrm{eV} (often quoted as 1.22 xx10^(19)GeV1.22 \times 10^{19} \mathrm{GeV} ), which is seven or eight order of magnitudes larger than the highest energy cosmic ray yet observed. One can speculate whether or not this observation is significant (see Exercise 49.1). ^(4){ }^{4} It is simplest to simply compute E_(P)=E_{\mathrm{P}}=m_(P)c^(2)m_{\mathrm{P}} c^{2}, as mentioned in the previous sidenote. ^(5){ }^{5} The Large Hadron Collider at CERN, a multi-billion dollar project, accelerates proton beams up to nearly 7 TeV per beam. This is incredibly impressive, but a TeV is only 1000 GeV . We are a long way off 10^(19)GeV10^{19} \mathrm{GeV}. ^(6){ }^{6} We use DD for the number of spacetime dimensions and d=D-1d=D-1 for the number of spatial dimensions.
The Planck length can be defined in other dimensions. Using some dimensional analysis one can show
Fig. 49.2 String ending on a 2-brane. ^(7){ }^{7} 'D' here stands for Dirichlet, or Pe ter Gustav Lejeune Dirichlet (18051859). Have fixed string ends is equivalent to a Dirichlet boundary condition delX^(i)//del tau=0\partial X^{i} / \partial \tau=0 at the end of the string. ^(8){ }^{8} As a result of this striking feature, string theory has been viewed as the natural successor to QFT in attempting to find a consistent quantum gravity. It also has a natural description of a S=2S=2 massless excitation, i.e. a graviton, leading some to suggest that string theory might have gravitation built into it.
Fig. 49.3 (a) A particle decay process in quantum field theory. (b) A particle decay in string theory.
where ℓ_(c)\ell_{\mathrm{c}} is the characteristic length scale of the compacted dimension.
The generalization of the last example to DD-dimensional spacetime is that
The quantity on the right is the volume of the extra dimensions V_(c)\mathcal{V}_{\mathrm{c}}. This gives us an answer to how dimensionality changes the strength of gravitation.
49.2 String theory
The idea of extra dimensions is taken up in the formalism of string theory. This theory shares many of the techniques of QFT, but involves replacing the idea of fundamental particle-like excitations of fields with excitations of fundamental one-dimensional strings. In short, strings are one-dimensional objects with no internal structure, and particles are excitations of these strings. Strings are not built from parts and so we can't deal with some part of a string. Strings come in many forms: they can be open (meaning the ends do not join) or closed (where they form loops). Open strings can have free endpoints or fixed end points. The objects on which strings terminate to fix the ends are characterized by their dimensionality and are called D-branes. ^(7){ }^{7} An example of a string ending on a 2-brane is shown in Fig. 49.2.
First some good news: the use of strings instead of particles largely resolves one of the most persistent conceptual problems with QFT.
Example 49.4
A problem that bedevils QFT is the divergence of quantities and the consequent need for renormalization. A particle decay process in QFT is depicted in the Feynman diagram shown in Fig. 49.3(a), involving the interaction of fields (or particles) at a point. This is the source of the problem: the evaluation of the fields at the same point can cause contributions to perturbation theory to diverge, necessitating the removal of infinities using a number of sophisticated techniques, known as renormalization, whose validity have been the subject of persistent debate. Even if we trust renormalization, it fails to remove the infinities encountered in quantum gravity in the sorts ization, it fails to remove the infinities encountered in quantum gravity in the sorts
of scattering processes described in Chapter 47. The QFT divergences are avoided to of scattering processes described in Chapter 47 . The QFI divergences are ave in string theory, where particle processes resemble the example shown some extent in string theory, where particle processes resemble the example shown
in Fig. 49.3(b). Here the particles, represented as closed strings, never meet at a point. ^(8){ }^{8}
If we accept that strings might be of some use, then we should examine how to incorporate them into field theory. Fortunately, we can use many of the techniques developed in the course of this book to investigate strings. To get an idea how to do this, let's return to the relativistic description of particles and build a string theory from there.
A free particle of mass mm has a world line found by extremizing an action
Here the world line is parametrized by tau\tau. This parametrization is usually the proper time for massive particles, but could be chosen to be another affine parameter. ^(9){ }^{9} We use a set of coordinates x^(mu)x^{\mu} and capture the motion of the particle with which we can express the world line as a curve x^(mu)(tau)x^{\mu}(\tau). The equation of motion derived from the particle's action is dp_(mu)(tau)//dtau=0\mathrm{d} p_{\mu}(\tau) / \mathrm{d} \tau=0, where the momentum p_(mu)=del L//delx^(˙)^(mu)p_{\mu}=\partial L / \partial \dot{x}^{\mu}.
The treatment of the string is very similar to that of the particle. The major difference is that since the string is a one-dimensional object, rather than a world line, it traces out a two-dimensional world sheet. Just as we parametrize the world line with a parameter tau\tau, we parametrize the world sheet with two parameters tau\tau and sigma\sigma. The names are chosen to allow us to think of the tau\tau parameter as telling us about the behaviour with respect to time and sigma\sigma the coordinate telling us how far we are along the string. ^(10){ }^{10} For an open string sigma\sigma ranges from 0 to some maximum sigma_(**)\sigma_{*}. The world sheet is potentially curved in an interesting way and so we seek to embed it in Euclidean space, in the same way that we describe in Appendix D.
We shall choose to work in Euclidean space with coordinates x^(mu)=x^{\mu}=(t,x,y,z)(t, x, y, z). In this space, positions on the string's world sheet have coordinates X^(mu)=(T,X,Y,Z)X^{\mu}=(T, X, Y, Z), so that a position vector of a point on the string world sheet is written as X(tau,sigma)\boldsymbol{X}(\tau, \sigma) [i.e. input tau\tau and sigma\sigma labelling the part of the world sheet, return coordinates x^(mu)=(t,x,y,z)=x^{\mu}=(t, x, y, z)=(T,X,Y,Z)=X^(mu)(T, X, Y, Z)=X^{\mu} labelling the point on the world sheet]. The goal is to describe the world sheet of the string, some examples of which are shown in Fig. 49.4. To do this we embed the world sheet in Euclidean space, and then find the induced metric gamma\gamma of the world sheet. As discussed in Appendix D, the metric gamma\gamma has components given by
where the products are things like X^(˙)*X^(')=eta_(mu nu)X^(˙)^(mu)(X^('))^(nu)\dot{\boldsymbol{X}} \cdot \boldsymbol{X}^{\prime}=\eta_{\mu \nu} \dot{X}^{\mu}\left(X^{\prime}\right)^{\nu}, and so on. This provides us with a simple way to describe the world sheet.
Just as the action for a particle is proportional to the interval along the world line, the action for the string is proportional the effective area ^(9){ }^{9} Indeed the ability to reparametrize the world line in Chapter 8 led to substantial simplifications. This is also the case in the next section. ^(10){ }^{10} There is nothing forcing us to do this though. While such an interpretation isn't mandated, it provides the simplest description of the string's motion.
Fig. 49.4 (a) Open string world sheet. (b) Closed string world sheet.
of the world sheet. For the world sheet with a metric gamma\gamma, the area AA is
where gamma\gamma is the determinant of the matrix gamma_(alpha beta)\gamma_{\alpha \beta}. Using eqn 49.16 we can compute the determinant as gamma=(X^(˙))^(2)(X^('))^(2)-(X*X^('))^(2)\gamma=(\dot{\boldsymbol{X}})^{2}\left(\boldsymbol{X}^{\prime}\right)^{2}-\left(\boldsymbol{X} \cdot \boldsymbol{X}^{\prime}\right)^{2}. Finally, the constant of proportionality that relates the area to the string action is the tension in the string T_(0)T_{0}, which plays a role like the mass does in the particle theory.
With these ingredients we can write the resulting Nambu-Goto
Yoichiro Nambu (1921-2015) and Tetsuo Goto (1931-1982) ^(11){ }^{11} Recall from Chapter 40 that the Euler-Lagrange equation for a Lagrangian density L\mathcal{L} built from scalar fields is
When written out, these look rather complicated, but they can be simplified, as we shall see in the next section.
49.3 Parametrizing the string
We can choose the parametrizations to simplify the equations of motion as much as possible, just as we did in Chapter 9. In this context, the choice of parametrization is known as choosing a gauge. First we make a choice of the timelike parameter tau\tau. The most direct choice is to treat this as the coordinate time by fixing tau=t\tau=t, which is a choice known as static gauge. In this gauge, lines of constant tau\tau represent static strings, i.e. an observer [with coordinates (t,x,y,z)](t, x, y, z)] will see a static string at a
fixed time tt, as shown in Fig. 49.5. In static gauge, the derivatives of the world sheet coordinates X^(mu)=(t,X^(i))X^{\mu}=\left(t, X^{i}\right) have components
where the 3 -vector vec(X)\vec{X} has components X^(i)X^{i}. Static gauge allows us to work in terms of the velocities (i.e. derivatives with respect to time t=taut=\tau ). The most useful of these is the transverse velocity that we discuss in the next example.
Example 49.6
Consider a static string at a fixed value of t=taut=\tau. An element of string length can be written using an infinitesimal ds\mathrm{d} s, where
Here, sigma\sigma is the parameter along the length of the static string and the vector d vec(X)//dsigma\mathrm{d} \vec{X} / \mathrm{d} \sigma is tangent to the string. We can then deduce, using the previous equation, that
so the quantity del vec(X)//del s\partial \vec{X} / \partial s is a unit vector. The vectors del vec(X)//del sigma\partial \vec{X} / \partial \sigma and del vec(X)//del s\partial \vec{X} / \partial s are parallel since del vec(X)//del s=(del vec(X)//del sigma)(del sigma//del s)\partial \vec{X} / \partial s=(\partial \vec{X} / \partial \sigma)(\partial \sigma / \partial s), so del vec(X)//del s\partial \vec{X} / \partial s is a unit tangent vector to the string.
To project out the part of a vector v\boldsymbol{v} that is perpendicular to a unit vector n\boldsymbol{n} we write v_(_|_)=v=(v*n)nv_{\perp}=\boldsymbol{v}=(\boldsymbol{v} \cdot \boldsymbol{n}) \boldsymbol{n}. With this simple rule we can use the string coordinate write v_(_|_)=v-(v*n)nv_{\perp}=\boldsymbol{v}-(\boldsymbol{v} \cdot \boldsymbol{n}) \boldsymbol{n}. With this simple rule we can use the
velocity vec(v)=del vec(X)//del t\vec{v}=\partial \vec{X} / \partial t and obtain the transverse velocity vec(v)_(_|_)\vec{v}_{\perp} thus
Although this is certainly an improvement, one final choice of sigma\sigma parametrization gives us the simplified equation of motion that we're after.
Example 49.7
We shall attempt to fix the parametrization of sigma\sigma such that the string velocity vec(v)\vec{v} is perpendicular to the string tangent, which would mean
This is useful as the transverse velocity becomes vec(v)_(_|_)=(del( vec(X)))/(del t)\vec{v}_{\perp}=\frac{\partial \vec{X}}{\partial t}, and, in terms of this vec(v)_(_|_)\vec{v}_{\perp} we have that the momenta from eqns 49.25 can be rewritten as
This is a wave equation, whose solution (the dynamics of the string) can be written down: they are simply plane waves.
Our conclusion is that, with the right choice of parameters for the world sheet, the strings support wavelike excitations. These can be used to describe particle excitations. Further development of the theory would involve quantizing the string motion to extract the properties of the particles. ^(12){ }^{12} For example, the graviton can be identified as a vibrational state of a closed string.
49.4 Strings in relativity
Strings crop up in several places in general relativity. Most prominent are superstrings, which are strings with dimensions of the Planck length that possess supersymmetry. ^(14){ }^{14} All dark matter searches performed using detectors on Earth tacitly as sume that candidate dark matter particles interact with ordinary matter by a non-gravitational force, e.g. electromagnetic coupling. Such experiments have not yet succeeded in detecting any such particles. But perhaps dark matter particles only interact gravitationally, explaining why they are so hard to find in particle physics experiments. ^(12){ }^{12} This is discussed in the book by Zwiebach (2009). ^(13){ }^{13} The gravitational interaction be tween matter in a galaxy and dark matter leads to the formation of a dark matter halo surrounding each galaxy, inferred from the observed effect on the motion of stars orbiting the galaxy.
.
Example 49.8
Observations suggest that most of the matter in the Universe is not seen: it is dark matter, which does not interact with light. In fact, it only seems to interact via its gravitational effect, ^(13){ }^{13} explaining why it has never been directly detected. ^(14){ }^{14} that are designed to One solution to this problem invokes supersymmetry. In supersymmetry, every boson particle in the Universe has a partner Fermi particle of the same mass (and vice versa). We can then identify a symmetry operator with the property that
Acting on boson AA with hat(Q)\hat{Q} gives fermion AA-ino. (Example: hat(Q)\hat{Q} acting on a photon gamma\gamma, gives a photonino tilde(gamma)\tilde{\gamma}.) Acting on fermion BB gives boson sB\mathrm{s} B. (Example: hat(Q)\hat{Q} acting on a quark qq gives a squark tilde(q)\tilde{q}.)
Perhaps these new particles generated by supersymmetry compose dark matter. If so, the darkness of dark matter follows from quantum mechanical selection rules that suppress the probability amplitudes for matter to interact with dark matter. There is currently no experimental evidence for the existence of these extra particles.
Superstring theory allows the incorporation of supersymmetric particles into the theory. A generalization of the original superstring models is called mm-brane theory, or MM-theory, where mm is the number of dimensions. (The theory suggests that up to m=9m=9 branes can exist.) It is hoped that MM-theory could allow a consistent theory to be developed that incorporates gravitation and the other three fundamental interactions. However, it is a well-known problem that MM-theory has not yet been able to make falsifiable predictions that can be tested by experiment. We have seen how extra dimensions should lead to a modification of the dependence of the gravitational force with distance and so, at the Planck length, measurable differences might be detectable. Of course, we have no means of making measurements at the Planck length at present.
In a completely different context, there are suggestions that there exists another sort of string in the Universe, this type potentially spanning large distances. These cosmic strings are created in the early Universe on a microscopic length scale and stretched out during the subsequent expansion. They would have a gravitational effect owing to their tension meaning they could be detected via gravitational lensing (see Chapter 24). The lensing effect due to the string is predicted to cause a distant light source (e.g. a star) to show up as two images, owing the curvature of spacetime around the string.
Example 49.9
The origin of cosmic strings follows from the phase transition in the early Universe, that we discussed in Chapter 41. The most simple system showing a phase transition is that of the scalar field where, on cooling, the potential changes from that shown in Fig. 49.6(a) to that in 49.6(b), with the assumption that the system falls into one of these two minima, which occur at a field phi=+-phi_(0)\phi= \pm \phi_{0}. However, in any phase transition, we have the possibility of forming domains. These result in the system breaking symmetry in different ways in different regions of space. That is, part of the system falls into one minimum in the potential, and part falls into another. For the scalar field we might have the situation shown in Fig. 49.7. On the left the system has fallen into the minimum in the broken symmetry potential at -phi_(0)-\phi_{0}; on the right it has fallen into the minimum at phi_(0)\phi_{0}. The space between these two domains involves a field that must smoothly evolve between the -phi_(0)-\phi_{0} and phi_(0)\phi_{0}. These regions are called defects. A one-dimensional defect, known as a wall or a kink, is shown in Fig. 49.7.
Fig. 49.6 The potential discussed in Chapter 41: (a) shows the high temperature potential; (b) shows the broken symmetry potential at low temperature.
Fig. 49.7 A domain wall linking two regions where symmetry is broken in different ways.
Fig. 49.8 The broken symmetry potential for the complex scalar field.
Fig. 49.9 The vortex field configuration. ^(15){ }^{15} Further information on defects and vortices in field theory can be found in our Quantum Field Theory for the Gifted Amateur. ^(16){ }^{16} The curvature around the cosmic string is examined further in the exercises.
A more interesting example takes place for the complex scalar field psi(x)\psi(x). The broken-symmetry potential-energy surface for this field is shown in Fig. 49.8. It is two dimensional, reflecting the two degrees of freedom that the complex field possesses (i.e. the real and imaginary parts). The potential resembles the punt at the bottom of a wine bottle, or a Mexican hat (it is sometimes called the Mexican-hat potential for this reason). There are an infinite number of minima in this potential that occur at the same radius |phi_(0)|\left|\phi_{0}\right| in the complex plane, but at different values of the complex phase theta(x)\theta(x). In choosing a broken symmetry ground state, the complex field psi(x)=|psi(x)|e^(itheta(x))\psi(x)=|\psi(x)| \mathrm{e}^{\mathrm{i} \theta(x)} picks a particular phase value |psi_(0)|e^(itheta_(0))\left|\psi_{0}\right| \mathrm{e}^{\mathrm{i} \theta_{0}} by selecting the phase angle theta_(0)\theta_{0}.
The defects in this potential are shown (in two spatial dimensions) in Fig. 49.9, and are known as vortices. ^(15){ }^{15} Like the domain wall, the vortex can be understood as the simplest way in which the system can have different spatial parts in different minima of the potential. The resulting vortex has the feature that the gradient in the phase grad theta(x)\boldsymbol{\nabla} \theta(x) diverges at the centre of the vortex, giving rise to a singularity. To translate this picture into three dimensions, we imagine stacking vortices on top of each other, such that the singularities form a one-dimensional path. This curve through the vortex cores is the cosmic string. ^(16){ }^{16}
Finally, it has been suggested that Schwarzschild black holes might in fact be strings, whose physics is then amenable to a description using string theory. There are some hints that the properties of such a black hole coincide with those of a string, although we do not yet have conclusive evidence for this.
49.5 Superspace
String theory represents only one of many possible approaches that attempt to reconcile general relativity and quantum mechanics. A very different strategy is to accept that the uncertainty inherent in the quantummechanical description of particle mechanics means that spacetime, with its rigid (3+1)-dimensional structure, can itself only be a classical approximation. In fact, it approximates a more subtle and complex state of affairs that allows quantum states to be realized with certain probability amplitudes psi\psi. This is quite a radical approach, in that it implies that the successful unification of quantum mechanics and gravitation involves abandoning the picture of a structured spacetime describing events, as the basic arena of gravitation.
In order to build a more suitable foundation, we start with classical spacetime and note that we can deconstruct it by taking threedimensional spacelike slices, or 3 -surfaces ^((3))C{ }^{(3)} \mathcal{C}. There is some freedom in how we do this, but this is subject to the constraint that we can rebuild spacetime by stacking up the slices in a well-defined manner. Next, in order to accommodate the probabilistic features of quantum mechanics, our collection of spacelike surfaces must be vastly increased to form a superspace. This superspace will include all possible 3 -surfaces, which in a quantum theory, each occur with a particular probability amplitude psi(^((3))C)\psi\left({ }^{(3)} \mathcal{C}\right).
Example 49.10
The quantum properties of a system are described in terms of a quantum amplitude, which is determined through the combination of the phases of interfering wavelike contributions. In Feynman's 'sum over histories' description of quantum mechanics, ^(17){ }^{17} the phase of each contribution is determined by the classical action S(^((3))C)S\left({ }^{(3)} \mathcal{C}\right) of the corresponding configuration of 3 -space ^((3))C{ }^{(3)} \mathcal{C}, such that each contribution to the wavefunction take the form
To obtain the quantum probability amplitudes, we must then sum all of the possible psi(^((3))C)\psi\left({ }^{(3)} \mathcal{C}\right) that are compatible with the problem we're considering.
We obtain constructive interference when the actions for two configurations match up. This implies that the dynamics of quantum gravity can be determined by computing the details of how wavefronts of constant action SS propagate throughout the superspace. The equation governing this (entirely classical) propagation is known as the Einstein-Hamilton-Jacobi equation ^(18){ }^{18} and is given by
where gamma_(ij)\gamma_{i j} are components of the three-dimensional metric describing a hypersurface and gamma\gamma is its determinant. Here, ^((3))R{ }^{(3)} R is the Ricci scalar for the 3-geometry. Remarkably, this one equation contains the same information as the Einstein field equation.
The fluctuations that characterize the quantum world (e.g. the zeropoint fluctuations in a quantum oscillator) are expected to occur in the metric field on scale of the Planck length. In the superspace approach, quantum fluctuations on this scale cause the probability amplitudes psi(^((3))C_(i))\psi\left({ }^{(3)} \mathcal{C}_{i}\right) for a range of 3 -surfaces ^((3))C_(i){ }^{(3)} \mathcal{C}_{i} to take on appreciable values. This leads to a fundamental limit to how well the spacetime picture adopted elsewhere in general relativity approximates the fluctuating reality of the underlying quantum world.
Ultimately, the superspace approach, with its abstract geometry containing all of the 3-surfaces, and classical equation of motion for describing the phases corresponding to each, does not make any strong claims about the underlying structure of the interactions that allow quantum mechanics and gravitation to coexist and interact. This is, to its supporters, a positive aspect, since it represents a conservative and robust approach that relies on the metric field, instead of novel and unobserved features in Nature, such as those that are found in string theory.
49.6 Loop quantum gravity
An alternative approach to string theory that aims to combine quantum mechanics and gravitation is loop quantum gravity (LQG). This theory involves an attempt to quantize the geometry of spacetime itself. After all, lots of things in quantum mechanics become quantized, such as harmonic oscillator energy levels or angular momentum states, so why not spacetime geometry itself? The intuition is the following: imagine localizing a particle with precision LL. Heisenberg uncertainty ^(17){ }^{17} In brief, Feynman's approach to quantum mechanics involves considering every possible path a particle can take in getting between two points. We compute the classical action S_(i)S_{i} for each trajectory and then build the quantum amplitude for the particle to travel between the two points by summing a factor e^(iS_(i)//ℏ)\mathrm{e}^{\mathrm{i} S_{i} / \hbar} for every possible trajectory, to give an amplitude A=sum_(i)e^(iS_(i)//ℏ)\mathcal{A}=\sum_{i} \mathrm{e}^{\mathrm{i} S_{i} / \hbar}. In this way, the amplitude for a quantum mechanical process is built by a sum over all possible trajectories. This picture is described in more detail in our Quantum Field Theory for the Gifted Amateur (2014) ^(18){ }^{18} The Hamilton-Jacobi equation in classical particle mechanics reads H=H=-(del S)/(del t)-\frac{\partial S}{\partial t}. It describes the evolution of the function SS (equal to the classical action) resulting from a Hamiltonian function HH. This is the only formulation of classical mechanics that lation of classical mechanics that represents particle motion in terms of the properties of a wave with a phase determined by SS. There is no coincidence then, that Schrödinger's equation closely resembles the Hamilton-Jacobi equation. The version of the HamiltonJacobi equation in eqn 49.39, suitable for computations in general relativity, involves the evolution of the function SS with respect to the 3 -metric components, driven by a sqrtgamma^((3))R\sqrt{\gamma}{ }^{(3)} R. More details of the Hamilton-Jacobi equation in classical particle mechanics can be found in Landau and Lifshitz (volume I, 1976). ^(19){ }^{19} This phrase is from Rovelli and Vi dotto (2015), a highly readable introduction to LQG. ^(20){ }^{20} The total angular momentum (squared) is related to the operator vec(vec(L))^(2)=( hat(L)^(1))^(2)+( hat(L)^(2))^(2)+( hat(L)^(3))^(2)\overrightarrow{\vec{L}}^{2}=\left(\hat{L}^{1}\right)^{2}+\left(\hat{L}^{2}\right)^{2}+\left(\hat{L}^{3}\right)^{2}.
(a)
(b)
Fig. 49.10 (a) The quantization of a tetrahedral region of spacetime. (b) Overlapping volumes can be reduced to Overlapping volumes can be reduced to
a graph in which closed paths over the a graph in which closed paths over
nodes are the loops of the theory.
would demand that L Delta p:)ℏL \Delta p\rangle \hbar, and since (Delta p)^(2)=(:p^(2):)-(:p:)^(2)(\Delta p)^{2}=\left\langle p^{2}\right\rangle-\langle p\rangle^{2}, this means (:p^(2):) > (ℏ//L)^(2)\left\langle p^{2}\right\rangle>(\hbar / L)^{2}. As we localize the particle in a smaller and smaller region, its momentum will go up and so will its energy EE, and it will become ultra-relativistic so that E~~pcE \approx p c and hence EE will exceed ℏc//L\hbar c / L. Energy EE acts as gravitational mass via E=Mc^(2)E=M c^{2} and a concentrated mass in this small region LL will become a black hole with Schwarzschild radius R=GM//c^(2)R=G M / c^{2}. This horizon will reach LL when L=(G//c^(2))(ℏc//L)//c^(2)L=\left(G / c^{2}\right)(\hbar c / L) / c^{2}, i.e. L=ℓ_(P)L=\ell_{\mathrm{P}} so that the particle is localized within a Planck length. Thus, we conclude that though spacetime might be smooth at length scales above the Planck length, things below the Planck length are 'hidden inside its own mini-black hole, ^(19){ }^{19}
So how do we go about quantizing spacetime? LQG uses the quantummechanical intuition that it is the commutation relations between operators that give rise to quantization conditions. For example, the components of angular momentum hat(L)^(i)(i=1,2,3)\hat{L}^{i}(i=1,2,3) obey the commutation relation
and this leads to the quantization of angular momentum. Moreover, they lead to the interesting feature that you can know the (square of the) total angular momentum ^(20){ }^{20} and any one component of the angular momentum (such as hat(L)^(3)\hat{L}^{3} ), but not the other components of the angular momentum. In LQG, we try and do the same thing with an element of space, and we start with a very simple three-dimensional shape: the tetrahedron shown in Fig. 49.10(a). One way of describing this tetrahedron is by using four vectors [see Fig. 49.10(a)] which we will call vec(L)_(a)\vec{L}_{a} where a=1,2,3,4a=1,2,3,4; the directions of these vectors are perpendicular to the four faces of the tetrahedron and the magnitudes are equal to the area of the four faces of the tetrahedron. These vectors satisfy the condition
and one can show that the volume VV of the tetrahedron is given by V^(2)=(2)/(9) vec(L)_(1)xx vec(L)_(2)* vec(L)_(3)V^{2}=\frac{2}{9} \vec{L}_{1} \times \vec{L}_{2} \cdot \vec{L}_{3}.
Geometry itself can then be quantized by upgrading these vectors into operators and imposing commutation relations between their components. One possible quantization scheme for our tetrahedron would then be to write
Here ℓ_(0)\ell_{0} is a constant, which must have dimensions of length (since hat(L)_(a)^(i)\hat{L}_{a}^{i} is an operator whose eigenvalue has the dimensions of area), and it turns out that it should be a constant multiplied by the Planck length. Equation 49.43 is simply a postulate, but what would it imply if we accepted it? The first thing to note is that the area of the faces of this tetrahedron would behave analogously to angular momentum in ordinary quantum mechanics. Thus, the area of the aa th face of a tetrahedron must
be quantized with eigenvalues
with j_(a)=0,1//2,1,3//2dotsj_{a}=0,1 / 2,1,3 / 2 \ldots The second thing one can conclude is that even if you know one Cartesian component of the area of one face of the tetrahedron, you won't know any of the other Cartesian components of the area of the other faces (for exactly the same reason that you can only know one component of the angular momentum). The normal vectors to each face of the tetrahedron are therefore known only partially and so these faces somehow blurrily shimmer with quantum uncertainty! We therefore deduce that geometry becomes fuzzy when you get down to the Planck scale; if your ambitions stretch to determining every aspect of the geometry of a shape at the smallest length scales, then you will be limited by fundamental quantum uncertainty. Thus, even though our argument has been formulated in terms of a tetrahedron, it would work if we had chosen some other shape and so we can't deduce that the smallest scale really does consist of a network of tetrahedra since we can't have precise information about microscopic quantum geometry. This then is the consequence of LQG: the Riemannian geometry that gives rise to gravitation must be replaced with quantum geometry that involves an inherent uncertainty at the Planck scale in at least some lengths, angles, and areas. In order to describe the curved spacetime of gravitation, we must consider a mesh of spacetime volumes such as the tetrahedron discussed above. These meshes can be reduced to graphs whose lines are analogous to lines of force. The loops in LQG are the closed paths linking nodes in these graphs [Fig. 49.10(b)].
Let's return to the tetrahedron, and imagine we know the eigenvalue j_(a)j_{a} for each vector operator hat(vec(L))_(a)\hat{\vec{L}}_{a}, remembering that we also know that these vectors are subject to a closure property (eqn 49.42). The volume operator hat(V)\hat{V}, defined via hat(V)^(2)=(2)/(9) hat(vec(L))_(1)xx hat(vec(L))_(2)* hat(vec(L))_(3)\hat{V}^{2}=\frac{2}{9} \hat{\vec{L}}_{1} \times \hat{\vec{L}}_{2} \cdot \hat{\vec{L}}_{3}, commutes with the closure operator hat(C)\hat{C} (defined by hat(C)=sum_(a=1)^(4) hat(vec(L))_(a)\hat{C}=\sum_{a=1}^{4} \hat{\vec{L}}_{a} ) and so it turns out that we can write states as |j_(1),j_(2),j_(3),j_(4),v:)\left|j_{1}, j_{2}, j_{3}, j_{4}, v\right\rangle, a common eigenstate of the four total area operators and the volume operator (with eigenvalue vv ). Thus, there is a fundamental 'quantum of space', so that the tetrahedron can grow or shrink only in discrete steps.
LQG remains a theory under construction and so it is not yet clear how well it describes our observations. Perhaps most seriously, we currently do not have a semiclassical limit of the theory that recovers general relativity. In addition to not reproducing the physical predictions of general relativity, LQG has not yet given rise to any prediction not made by the Standard Model. As a result, the jury is still out on this theory, as it is on all of the quantum approaches to gravitation. ^(21){ }^{21} ^(21){ }^{21} Rovelli and Vidotto (2015) give much more detail and discussion of LQG in a highly engaging form. Readers interested in further alternative (and techniested in further alternative (and techni-
cal) approaches to quantum gravity can cal) approaches to quantum gravity can
consult the vibrant literature on the consult the vibrant literature on the
subject. For example, for an introducsubject. For example, for an introduc-
tion to twistor theory see the books by tion to twistor theory see the books by
Penrose (2004) and by Zee (2013); for an introduction to Regge calculus see Misner, Thorne, and Wheeler (1973); for an introduction to spinors in relativity see Wald (1984) and also Misner, Thorne, and Wheeler (1973).
49.7 Anti-de Sitter spacetime
In studying quantum mechanics, we often use the idea of a particle confined to a box. ^(22){ }^{22} Clearly, confining the gravitational field to a box is ^(22){ }^{22} Our discussion in this section follows that of Zee, which can be consulted for further details. ^(23){ }^{23} We shall write dd-dimensional AdS spacetime as AdS^(d)\mathrm{AdS}^{d}. The holographic principle was originally proposed by Gerard 't Hooft (1946- ). It says that the description of a dd-dimensional volume of space can be encoded on its (d-1)-dimensional boundary (similarly (d-1)(d-1)-dimensional boundary (similarly to how a three-dimensional image is captured in a two-dimensional inter-
ference pattern in optical holography). ference pattern in optical holography). AdS^(d)\mathrm{AdS}^{d} spacetime represents a particularly vivid example of the holographic principle. ^(24)CFT={ }^{24} \mathrm{CFT}= conformal field theory, which is to say, a conformally invariant gauge field theory. AdS/CFT correspondence was originally proposed for spin the ory in AdS space by Juan Maldacena (1968-). The idea is that some theories of quantum gravity are equivalent to quantum theories with no gravitational interaction in fewer dimensions. It has been suggested that the correspondence might also provide insight into research might also provide insight into research areas such as condensed matter physics in the future. See Năstase (2017) for further details.
Fig. 49.11 de Sitter spacetime as a hyperboloid embedded in (4+1)(4+1) dimensional Minkowski spacetime. ^(25){ }^{25} In Chapter 15, we described a spacetime of constant curvature as having a Riemann tensor determined by the Ricci scalar. Equivalently, we described it in Chapter 16 as having a Riemann tensor with components
In this case, the constant K=alpha^(-2)K=\alpha^{-2}.
not something we can straightforwardly do in our own spacetime. It is possible to confine gravity in Anti-de Sitter (AdS) spacetime, whose geometry is related to the de Sitter spacetime we met in Chapter 15. A remarkable feature of dd-dimensional AdS spacetime is that it possesses a spatial boundary made up of Minkowski spacetime with one fewer dimension. ^(23){ }^{23} AdS spacetime has caught the imaginations of many relativists, especially after the discovery that the physics of some gravitational theories in (4+1)(4+1)-dimensional AdS spacetime (AdS^(5))\left(\mathrm{AdS}^{5}\right) can be mapped onto (3+1)-dimensional Minkowski space. This is known as the AdS/CFT correspondence ^(24){ }^{24} and is an active area of research into quantum theories of gravity. AdS spacetime is not a quantum theory of gravity in itself, but might be an important ingredient, and we discuss it in this section.
Before we describe AdS space, let's revisit the view of de Sitter spacetime discussed the exercises for Chapter 18. We saw there that model universes driven by a non-zero cosmological constant Lambda\Lambda can be represented geometrically using this spacetime. de Sitter spacetime can be visualized as a hyperboloid defined by
The embedding is shown from Fig. 49.11 with two dimensions suppressed. Going through the embedding routine from Appendix D, the result of eliminating a dimension is a line element with the form
where indices in the last equation run from 0 to 3 . The topology of this spacetime is built from three-dimensional spheres S^(3)S^{3} that start at T rarr-ooT \rightarrow-\infty with infinite radius, shrink down to a minimum radius alpha\alpha, before start expanding again for T rarr ooT \rightarrow \infty. We call this topology R^(1)xxS^(3)\mathbb{R}^{1} \times S^{3} (i.e. a real line representing the time combined with 3 -spheres at every instant). This is a spacetime of constant curvature ^(25){ }^{25} and consistent with a cosmological constant Lambda=R//4\Lambda=R / 4 and Einstein tensor with components G_(mu nu)=-(1)/(4)Rg_(mu nu)G_{\mu \nu}=-\frac{1}{4} R g_{\mu \nu}, where R > 0R>0.
We found in Chapter 18 that it is possible to represent models with different spatial curvatures by covering the hyperboloid with coordinates that make different cuts through the spacetime. This versatility of de Sitter spacetime follows from the high degree of symmetry that it possesses. In fact, another way of expressing its constant curvature is to say that de Sitter spacetime is an example of a maximally symmetric space. Here 'maximal symmetry' means that the space has the same number of symmetries as Euclidean space. A sphere also has this property and it is evident that the de Sitter spacetime can be thought of as a version of a higher dimensional sphere in Minkowski space, with the
transformation T^(2)rarr-T^(2)T^{2} \rightarrow-T^{2} providing a means of swapping between a sphere in Euclidean space and de Sitter geometry in Minkowski spacetime.
Anti-de Sitter spacetime can be thought of as de Sitter spacetime with R < 0R<0, corresponding to a negative cosmological constant -Lambda.^(26)-\Lambda .^{26} It has topology S^(1)xxR^(3)S^{1} \times \mathbb{R}^{3} and can be represented as a hyperboloid
which is a (3+2)(3+2)-dimensional Minkowski space with two timelike variables (i.e. variables that enter the metric with a minus sign). The embedding is shown in Fig. 49.12, with two dimensions suppressed. The spacetime resembling the de Sitter hyperboloid turned on its side. The embedding routine now results in a line element
with mu=0\mu=0 to 3 again. In general, we can swap back and forth between results in de Sitter space and anti-de Sitter space by swapping alpha^(2)rarr\alpha^{2} \rightarrow-alpha^(2)-\alpha^{2}.
The unusual shape of AdS\operatorname{AdS} spacetime allows for the existence of closed timelike loops. This is undesirable owing to the violation to causality that it allows. However, the tube-like shape of AdS spacetime can effectively be cut and unrolled with a good choice of coordinates. (We say the topology has been changed to R^(4)\mathbb{R}^{4} as a result.) The standard choice of coordinates that does this for three-dimensional AdS spacetime ( AdS^(3)\mathrm{AdS}^{3} ) is
We can produce a conformal version of the AdS line element using the ideas from Chapter 19. First, make the choice r=tan psir=\tan \psi, then rewrite eqn 49.54 to say
Here psi\psi is a latitude-like variable. As the radius-like coordinate rr goes from 0 to oo\infty, the latitude psi\psi goes from 0 to pi//2\pi / 2 (rather than pi\pi, as we might have expected). More colourfully, latitude in AdS spacetime goes from north pole to the equator, not to the south pole. With this observation, we have discovered the boundary AdS spacetime! ^(26){ }^{26} It has a Riemann tensor with components
Fig. 49.12 Anti-de Sitter spacetime as a hyperboloid embedded in (3+2)dimensional Minkowski spacetime. ^(27){ }^{27} See exercises for a derivation. Notice the resemblance to what we previously called the Poincaré line element: ds^(2)=\mathrm{d} s^{2}=(dr^(2)+dx^(2))//r^(2)\left(\mathrm{d} r^{2}+\mathrm{d} x^{2}\right) / r^{2}.
(b)
Fig. 49.13 (a) Anti-de Sitter spacetime with its boundary. (b) A massive particle bounces owing to the boundary in spacetime ^(28){ }^{28} This is of the form (kinetic energy) + (potential energy) == const. ^(29){ }^{29} The latter was how it has been regarded for most of its lifetime. AdS spacetime was originally discussed in the 1920 s by de Sitter (unhelpfully, both de Sitter and AdS were referred to as 'de Sitter spacetimes' for this reason). The spacetime was discovered in dependently by Tullio Levi-Civita. ^(30){ }^{30} 'You may have enjoyed this course and decided that you would like to do your thesis research in general relativity. DON'T. Einstein spent the last thirty years of his life working on general relativity, and it led to nothing. And he was smarter than you.' So said Sidney Coleman (1937-2007), albeit in 1970.
To examine the consequence of the boundary seen in the last example, it is helpful to (once again!) recast AdS spacetime, this time in Poincaré coordinates ^(27){ }^{27} as
The boundary now occurs at w=0w=0. We can see from this coordinate system how a slice made at a particular value of ww is simply Minkowski space with one fewer spatial dimension. This idea is shown in Fig. 49.13(a), where AdS spacetime terminates on this flat boundary.
To see the physical influence of the boundary, we shall shoot photons and massive particles at it and see what happens.
Example 49.12
A light beam sent from a point w_(0)w_{0} to the boundary will, if reflected by a mirror w=0w=0 come back after a coordinate time 2w_(0)2 w_{0}. Things are different for a massive particle. A massive particle obeys the usual condition on its velocity u*u=-1\boldsymbol{u} \cdot \boldsymbol{u}=-1, which gives us the equation of motion
where dots indicate a derivative with respect to proper time. Owing to the absence of the variable tt in the metric components we have a Killing vector del//del t\partial / \partial t and a conservation law u_(t)=g_(tt)u^(t)=u_{t}=g_{t t} u^{t}= const., or
Let's write this latter equation t^(˙)=w^(2)//b\dot{t}=w^{2} / b, with bb a constant length determined by the initial conditions. Substituting the conservation law back into eqn 49.57 we obtain an equation of motion in terms of the coordinate time ^(28){ }^{28} of
This describes the motion of a massive particle in a Newtonian potential with positive potential energy V(w)=(b//w)^(2)V(w)=(b / w)^{2}, which diverges as the boundary w=0w=0 is approached. As shown in Fig. 49.13(b), a particle set in motion in this potential never reaches the boundary. It must stop and turn back at the point w=bw=b.
The last example hints at how the energy of a particle might be something that can be mapped to a characteristic value of w=bw=b. In this way, the boundary of AdS\operatorname{AdS} space is able to encode the physics of the bulk.
Ultimately nobody yet knows whether AdS is an essential part of quantum gravity, or an intriguing curiosity. ^(29){ }^{29} Of course, the same could currently be said of each of the theories we have discussed in this chapter. It is perhaps not too optimistic to hope that, ultimately, experiment will provide the final word. ^(30){ }^{30}
49.8 Our current best guess
Which model best describes our Universe? In the last few decades, cosmology has gone from being a highly speculative field (in the 1950s
people were arguing about whether the Universe had a beginning or not) to one which is now heavily constrained by very well tied down measurements. Its practitioners claim that we have now entered the era of precision cosmology. The cosmic background explorer (COBE) satellite revealed in 1992 that the cosmic microwave background (CMB) exhibits a beautiful blackbody spectrum with a temperature of 2.725 K , and this spectrum is pretty smooth across the sky (there are fluctuations but their amplitude is delta T//T∼10^(-5)\delta T / T \sim 10^{-5} ). In 2001, the Wilkinson Microwave Anisotropy Probe (WMAP) was launched to measure the angular spectrum of these fluctuations in greater detail, and 2009 saw the launch of the Planck satellite, which improved on these measurements even further. These data fit well with a Big-Bang cosmological model known as LambdaCDM\Lambda \mathbf{C D M}, where Lambda\Lambda refers to the cosmological constant and CDM refers to cold dark matter.
Let's unpack these terms. First, the presence of Lambda\Lambda in the model is consistent with the experimental observation that the expansion of the Universe is currently accelerating, which has been determined by measurements of type-Ia supernovae used as 'standardized candles'. An inflationary period in the Universe's history mandates that the Universe must be very close to its critical density, and the models show that Omega_(0)\Omega_{0} (the ratio of the Universe's density to the critical density) is 0.999(2)0.999(2). From a variety of measurements, it is found that the baryonic ^(31){ }^{31} matter in the Universe has a density of Omega_(B)=0.05\Omega_{\mathrm{B}}=0.05. Now we come to CDM, non-baryonic matter which resides in the halos of galaxies. ^(32){ }^{32} The CDM density comes out to be Omega_(M)=0.26\Omega_{\mathrm{M}}=0.26, much larger than the baryonic density, but the sum of the two does not yield Omega_(0)~~1\Omega_{0} \approx 1, so that Omega_(Lambda)=0.69\Omega_{\Lambda}=0.69 has to make up the difference. The cosmological term has been termed dark energy; it is believed to have a very low density, but is completely uniform across all space ^(33){ }^{33} and so dominates the overall mass/energy of the Universe. It might be thought to be the energy of the quantum vacuum, though in 1968 Zel'dovich pointed out that though the energy of the quantum vacuum could contribute to Lambda\Lambda, it would result in an energy density fifty orders of magnitude larger than the critical density.
Here is our best guess of how all this fits together: at around 10^(-32)s10^{-32} \mathrm{~s} after the Big Bang there is a period of inflation ^(34){ }^{34} in which the Universe expands exponentially, leading to a scale-invariant spectrum of gravitational waves (not yet detected in experiments, but may well be found in the coming years). This is terminated only when the scalar field potential energy is converted into particles. The resulting quark soup condenses into hadrons at t~~10^(-5)st \approx 10^{-5} \mathrm{~s}, with baryons and antibaryons in roughly equal number, and as abundant as thermal photons. However, at t∼1st \sim 1 \mathrm{~s}, as the Universe cools below the temperature of the lightest baryon, most baryons and antibaryons annihilate each other, leaving a small excess ^(35){ }^{35} of baryons over antibaryons in the few ^(36){ }^{36} that remain. From about 0.01 s to 20 minutes, we have a period known as Big-Bang nucleosynthesis (BBN) when the lightest elements form, mostly ^(4)He{ }^{4} \mathrm{He}, but also some deuterium (D), ^(3)Li{ }^{3} \mathrm{Li}, and ^(7)Li{ }^{7} \mathrm{Li}. The Universe cools further and the baryons fall into the gravitational potential wells produced by CDM ^(31){ }^{31} Reminder: Baryons are particles like protons and neutrons which are composites of quarks, but this is a shorthand for the 'ordinary' matter in the Universe. ^(32){ }^{32} Hot dark matter models were ruled out fairly quickly. CDM is cold, meaning that the dark matter particles are moving at speeds ≪c\ll c, and so become trapped in the gravitational potential wells of galaxies. They interact with gravity, but not with the strong or electromagnetic force (so we can't see them); they may or may not couple via the weak interaction. Since no-one has directly detected a CDM particle, we don't really know. The only reason we believe they are in halos around galaxies is the effect they have on the rotations of stars around galaxies via measurements that are known as rotation curves. ^(33){ }^{33} This is unlike ordinary matter which is strongly clumped in stars and galaxies, with lots of regions of space empty of ordinary matter. ^(34){ }^{34} Inflation is discussed in detail in Chapter 41. ^(35){ }^{35} This is called baryogenesis. It is thought that non-equilibrium interactions that violate baryon number conservation, charge (C) conservation, and CP conservation, might allow the Universe to evolve a small net baryon abundance. ^(36){ }^{36} Today there are only a few baryons per billion photons. ^(37){ }^{37} There are also photons in the Universe, i.e. radiation. However, recall from Chapter 17 that cold matter density prop1//a^(3)\propto 1 / a^{3}, directly due to the volume expansion of the Universe, but radiation density prop1//a^(4)\propto 1 / a^{4}, which has an extra dependence on the scale factor extra dependence on the scale factor
given by 1//a1 / a due to cosmological redgiven by 1//a1 / a due to cosmological red-
shift. Thus, the Universe becomes matshift. Thus, the Universe becomes mat-
ter dominated, since the radiation denter dominated, since the radi
sity decreases more quickly. ^(38){ }^{38} Thus, we have only a loose understanding of the physics occurring at energies corresponding to the era of baryogenesis, let alone the inflationary era.
particles. At around 380,000 years after the Big Bang, the Universe is much cooler, so neutral atoms start to form (the nuclei find electrons to orbit around them) and the Universe becomes transparent to photons; the CMB dates from this era, giving us a snapshot of the Universe at this time. The expansion of the Universe is dominated ^(37){ }^{37} by the ordinary matter and dark matter. Both types of matter are 'cold', meaning that they are non-relativistic and pressure-less fluids. The Universe is now reasonably well described by the Einstein-de Sitter model, which we called 'Universe 3' back in Chapter 18. From about 5 Gyr ago, the expansion of the Universe starts to become dominated by the cosmological constant term, and so in other words dark energy takes over and the era of cosmic acceleration began. The Universe is now believed to be 13.80(2)13.80(2) Gyr old and its expansion is accelerating.
Can we believe this LambdaCDM\Lambda \mathrm{CDM} picture? One impressive feature is that the numerical values of the various constants (e.g. the Omega\Omega-values mentioned above) are pretty consistent between completely independent measurements based on (i) the gravity-driven acoustic oscillations in the CMB (which come from the surface of last scattering, determined at a time 380,000 years after the Big Bang) and on (ii) deuterium abundance (due to nuclear reactions that start to take place about one second after the Big Bang). The very early Universe involves some very high energies and temperatures ^(38){ }^{38} but the microphysics of the eras of (i) and (ii) above have been tested in the laboratory, so we can have some confidence that this picture might be right. LambdaCDM\Lambda \mathrm{CDM} is therefore the current 'standard model' of modern cosmology and has survived various tests unscathed. However, it is worth saying that there are still some wrinkles. The abundance of ^(7)Li{ }^{7} \mathrm{Li} is not quite right, baryogenesis is not completely understood, and different measurements are giving slightly different values of the Hubble constant (they vary between about 67 and 74kms^(-1)Mpc^(-1)74 \mathrm{~km} \mathrm{~s}^{-1} \mathrm{Mpc}^{-1} ), a problem which is called Hubble tension. These problems might ^(39){ }^{39} go away following more detailed measurements, or perhaps following better understanding of some of the systematic errors. More significant is the fact that the physics of inflation is not well tied down, and our current model of inflation is, at best, an approximation; where does inflation actually come from? Even worse, we have not been able to detect any candidate dark matter particles in experiments (despite intensive and patient searches) and so we don't really know what dark matter is. And as for dark energy, we have even less idea. Is dark energy perhaps simply a phantom effect, with the real reason for cosmic acceleration being a consequence of whatever theory succeeds general relativity?
However our understanding of the Universe evolves in the future, we can be reasonably confident that an unalterable part of the picture will be the existence of the Big-Bang singularity. Thus, in our final chapter, we will turn to the Big Bang and describe how this event fits naturally into our current best theory of gravity: general relativity.
Chapter summary
Extra dimensions could exist if compactified on the Planck length scale. They would have an effect on the nature of gravity.
String theory describes the dynamics of one-dimensional strings, whose excitations are quantum particles. The theory describes the world sheet of the string and leads to a wave-equation of motion.
Superspace offers a different approach, where classical spacetime must be replaced with a quantum-mechanical structure.
Another alternative approach to quantum gravity involving the quantization of spacetime is offered by loop quantum gravity.
Anti-de Sitter spacetime has a boundary made up of Minkowski spacetime with one fewer spatial dimension.
Lambda\Lambda CDM is the 'standard model' of modern cosmology and is supported by a great deal of experimental evidence. It might be correct.
Exercises
(49.1) Show that the wavelength of a particle with an energy equal to m_(P)c^(2)m_{\mathrm{P}} c^{2} would be around the Planck length ℓ_(P)\ell_{\mathrm{P}}. Estimate the gravitational self-energy of such a particle, as well as its Compton wavelength and Schwarzschild radius. (Ignore factors of 2 and pi\pi.)
(49.2) By considering the string action, show that our string theory is invariant with respect to reparametrization.
(49.3) Verify eqn 49.9.
(49.4) Show that
where H\mathcal{H} is the Hamiltonian density.
(49.6) Verify eqn 49.35 .
(49.7) Consider a very narrow tube around a cosmic string. We will solve the Einstein equation for this situation, which we shall assume has an energy-momentum tensor with components T_(mu nu)=T_{\mu \nu}=diag(rho,0,0,-rho)\operatorname{diag}(\rho, 0,0,-\rho), which is designed to be proportional to the Minkowski metric in the (t,z)(t, z) plane. The region is described by the line element
with constant r_(0)r_{0}.
If we accept that this looks similar to the flat-space metric ds^(2)=-dt^(2)+dr^(2)+r^(2)dphi^(2)+dz^(2)\mathrm{d} s^{2}=-\mathrm{d} t^{2}+\mathrm{d} r^{2}+r^{2} \mathrm{~d} \phi^{2}+\mathrm{d} z^{2}, then r_(0)thetar_{0} \theta in eqn 49.63 can be treated as a sort of radial variable.
(a) Compute the components of the Ricci tensor and the Ricci scalar for this spacetime and show that Einstein's equation is satisfied.
(b) If the outer surface of the string region occurs at theta_(m)\theta_{\mathrm{m}}, compute the cross-sectional area of the string and hence its mass per unit length.
(49.8) Following from the previous problem, now consider the region outside the string. The metric outside the string can be written as
(49.11) Show that de Sitter space solves an Einstein equation with components G_(mu nu)=-(3//alpha^(2))g_(mu nu)G_{\mu \nu}=-\left(3 / \alpha^{2}\right) g_{\mu \nu}. How does this differ for anti-de Sitter space?
This has the property that it smoothly joins on to (49.12) Consider a defining equation for AdS^(3)\mathrm{AdS}^{3} given by the interior metric from the last question at theta_(m)\theta_{\mathrm{m}}. (a) By making a suitable transformation, show that this metric describes flat spacetime in cylindrical polar coordinates.
(b) Although the spacetime seems flat, it isn't really. What is the circumference of a large circle in this spacetime with r^(')=a≫r_(0)r^{\prime}=a \gg r_{0} ?
(49.9) Consider a tetrahedron in three-dimensional space whose vertices are given by the four vectors vec(0), vec(a), vec(b)\overrightarrow{0}, \vec{a}, \vec{b}, and vec(c)\vec{c}. Derive expressions for the four area vectors vec(L)_(1), vec(L)_(2), vec(L)_(3)\vec{L}_{1}, \vec{L}_{2}, \vec{L}_{3}, and vec(L)_(4)\vec{L}_{4}, and show that they satisfy a closure property (eqn 49.42 ).
(49.10) Show that the volume of the tetrahedron in the previous exercise is given by
(49.65) (49.13) spacetime line element from eqn 49.56.
The Big-Bang singularity
I have shown that all the realms of the universe Are mortal, and the substance of the heavens Had birth; and I have explained most of those things That in heaven occur and must occur. Lucretius (c. 100BC-c.50BC100 \mathrm{BC}-\mathrm{c} .50 \mathrm{BC} ) On the Nature of Things
The Robertson-Walker spaces, and many of the Friedmann universes that we have built from them, have one particularly notable feature: an initial Big-Bang singularity. In this chapter, we ask whether this is an artefact of our theory, or whether we should regard the Big Bang as a realistic event in our Universe. We shall show that, with a minimal set of assumptions, a singularity occurring at some time in the past is indeed a realistic, and possibly even an inevitable, prospect. ^(1){ }^{1}
Let's construct a spacelike hypersurface SS in our spacetime at some fixed value of the cosmic time. We assume that it is a special sort of hypersurface known as a Cauchy surface. Such a surface has the property that the events on it completely determine some future surface, lying in what is known as the domain of dependence of SS. If this is the case then it turns out there must exist a longest-timelike curve from SS to some point on the domain of dependence. ^(2){ }^{2}
The existence of such a longest timelike curve doesn't seem especially scandalous, but it is the subject of the argument in this chapter that reveals, in general terms, the necessity for a Big-Bang singularity to have started our Universe. Let's go to work.
50.1 Facts about Euclidean geometry
Let's first review a few useful facts about Euclidean 3 -space. (We will then apply similar ideas to curved (3+1)(3+1)-dimensional space.) We shall deal with a class of geodesics that all meet points on the surface SS. Although these, being geodesics, all represent the shortest distance between some point qq and some specific point on the surface SS, some will be shorter than others, depending on the point on SS on which they end.
Consider a path gamma\gamma between point pp and surface SS. If the path is the shortest one possible between pp and SS then gamma\gamma must intersect SS orthogonally. If it doesn't, as in the curve gamma\gamma shown in Fig. 50.1, then there is a shorter path gamma^(')\gamma^{\prime} that does meet the surface at right angles. We shall use the term 'orthogonal' to describe those geodesics that meet surfaces at right angles (they are also known as normal geodesics).
50.1 Facts about Euclidean geometry ^(1){ }^{1} In this chapter, we follow the argument in the form given by Geroch in General Relativity, 1972 Lecture Notes (2013). The techniques sketched here (sometimes called global techniques in their generalized form) are described in detail in Penrose (1973) and also in Wald (1984) and in Hawking and Ellis (1973). ^(2){ }^{2} The necessity for this longest curve existing follows from a technicality: the fact that the collection of all timelike and null curves from a point on the domain of dependence to SS is compact. Compactness is discussed in Appendix C, but can be thought of here as implying that no curves go off to infinity. As a result, length is a continuous function on this space of curves, and must achieve a maximum
Fig. 50.1 Curve gamma\gamma does not meet the surface SS at a right angle. The curve gamma^(')\gamma^{\prime} that does is shorter.
Fig. 50.2 The shortest point between qq and the surface SS involves travelling along gamma\gamma, rounding off the corner to avoid the crossing point at rr and then following gamma^(')\gamma^{\prime} down to the surface SS.
Fig. 50.3 The function phi\phi measures the distance along the orthogonal geodesics from the surface SS. ^(3){ }^{3} This can be seen by writing u_(mu)=u_{\mu}=del phi//delx^(mu)\partial \phi / \partial x^{\mu} and then writing u_(mu;nu)u_{\mu ; \nu} as ((del phi)/(delx^(mu)))_(;nu)=(del^(2)phi)/(delx^(nu)delx^(mu))-Gamma_(nu mu)^(alpha)((del phi)/(delx^(alpha)))\left(\frac{\partial \phi}{\partial x^{\mu}}\right)_{; \nu}=\frac{\partial^{2} \phi}{\partial x^{\nu} \partial x^{\mu}}-\Gamma_{\nu \mu}^{\alpha}\left(\frac{\partial \phi}{\partial x^{\alpha}}\right), which is manifestly symmetric since Gamma^(alpha)_(mu nu)=Gamma^(alpha)_(nu mu)\Gamma^{\alpha}{ }_{\mu \nu}=\Gamma^{\alpha}{ }_{\nu \mu}.
Now consider two orthogonal geodesics gamma\gamma and gamma^(')\gamma^{\prime} that cross at a point rr (Fig. 50.2). In this case, gamma\gamma can't be the shortest path from qq to the surface SS. This is because we can construct a shorter curve by rounding off the corner at the crossing point rr and then following the other curve gamma^(')\gamma^{\prime} down to SS, as is shown in Fig. 50.2. This argument can be made a little more rigorous by recalling our discussion in Chapter 8 of whether the action for a particular trajectory is a minimum or not. We saw that the existence of a conjugate point along the trajectory guarantees that it does not represent the minimum. The crossing point of the two orthogonal geodesics in this example is just such a conjugate point.
We now turn to spacetime, where these facts will be used in a slightly modified form.
50.2 Orthogonal geodesics in spacetime
In curved spacetime, recall that the longest lines are the straightest. That is, owing to the sign of the metric the most proper time elapses along the lines with least acceleration. We therefore use the results argued for Euclidean space above, but with longest replacing shortest.
Consider again a surface SS and its orthogonal geodesics in spacetime. We want a way of measuring the distance from SS along all of the orthogonal geodesics. To do this, we define a function phi(x)\phi(x) that provides a measure of this distance for that geodesic that passes through point with coordinates x^(mu)x^{\mu}, as shown in Fig. 50.3. We then define a velocity 1 -form field tilde(u)(x)\tilde{\boldsymbol{u}}(x) for the orthogonal geodesics with components u_(mu)=grad_(mu)phiu_{\mu}=\boldsymbol{\nabla}_{\mu} \phi. We work in a spacetime with a metric, so we can raise indices and form a velocity vector field u(x)\boldsymbol{u}(x) with components u^(mu)u^{\mu}. The velocity components have the usual property u^(mu)u_(mu)=-1u^{\mu} u_{\mu}=-1. From the definition of tilde(u)(x)\tilde{\boldsymbol{u}}(x) in terms of the scalar function phi(x)\phi(x), the components u_(nu;mu)u_{\nu ; \mu} of the covariant derivative of tilde(u)\tilde{\boldsymbol{u}}, are symmetric ^(3){ }^{3} with respect to exchange of mu\mu and nu\nu and so
since u^(mu)u_(mu)=-1u^{\mu} u_{\mu}=-1. Raising the nu\nu index on u^(mu)u_(nu;mu)=0u^{\mu} u_{\nu ; \mu}=0 we obtain grad_(u)u=\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{u}= 0 . This implies that the velocity vector field u(x)\boldsymbol{u}(x) is the tangent field of all of the orthogonal geodesics. We shall study the convergence of this field cc, defined as minus the divergence, or
Let's pause and interpret this equation. It is telling us about the change in divergence of the world lines of the orthogonal geodesics as we move along them. This depends on two terms: the second is related to the curvature of spacetime via the Ricci tensor, which tells us about the way that curvature causes volumes to shrink. The first term is given by the quantity u_(alpha;nu)u_{\alpha ; \nu}, which is a symmetric matrix that we must effectively multiply by itself and then trace over the result. Although this seems a little abstract, the next example shows that this quantity obeys a neat inequality, which makes it very useful.
Example 50.2
We separate out the trace and the trace-free part of the quantity u_(alpha;nu)u_{\alpha ; \nu} by using the transverse projection operator ^(4)P_(alpha beta)=g_(alpha beta)+u_(alpha)u_(beta){ }^{4} P_{\alpha \beta}=g_{\alpha \beta}+u_{\alpha} u_{\beta}. We write ^(5){ }^{5}
Consider the second term on the right, which effectively instructs us to take the square of the traceless part and then trace over the result. One thing we can say about the number that results is that, if it is non-zero, it must be positive. This leads to the key inequality
The term on the right is essentially an energy density, as seen by an observer with a tangent vector to their world line of u\boldsymbol{u}. Remember from Chapter 13 that the weak energy condition says that this quantity must be positive. This means we can use eqn 50.12 to refine eqn 50.9 to read
{:(50.14)grad_(u)c >= (c^(2))/(3):}\begin{equation*}
\nabla_{u} c \geq \frac{c^{2}}{3} \tag{50.14}
\end{equation*}
Physically, this says that gravity acts attractively, making world lines tend to converge along their length. This latter equation can also be solved. If the world lines are parametrized by a proper time tau\tau then grad_(u)-=D//dtau\nabla_{u} \equiv \mathrm{D} / \mathrm{d} \tau and we need only solve the simple differential equation
Taking the boundary conditions to be that, at the surface SS, we have tau=0\tau=0 and also c(tau=0)=c_(0)c(\tau=0)=c_{0}, then the result of integrating this equation is
The solution tells us about the effect of gravity drawing the world lines together: the convergence becomes infinite by a proper time tau=3//c_(0)\tau=3 / c_{0}. An infinite convergence implies that the world lines have started to cross. This is not particularly surprising, since world lines are certainly allowed to cross, but it is interesting. It means the world lines form a caustic, which is an envelope around the congruence of world lines that forces them to cross at some point. This caustic is simply a consequence of the attractive nature of gravity.
So we have a situation where every orthogonal geodesic must cross another by the time it reaches an interval tau=3//c_(0)\tau=3 / c_{0}. Recall our argument about crossing: if one orthogonal geodesic crosses another, then the first cannot be the longest timelike curve from a point beyond SS. But we also know (from our other geometric fact) that the longest curve must be an orthogonal geodesic. These contradictory statements can only be reconciled by the following:
If we go farther than tau=3//c_(0)\tau=3 / c_{0} from SS along any timelike curve to some point pp, then there is no longest timelike curve from pp to SS.
This statement implies that sufficiently far from SS, points in spacetime cannot be joined by an extremal timelike curve.
This all seems very abstract, as indeed it is, but the argument can be put straightforwardly: if we go a distance 3//c_(0)3 / c_{0} from SS, then we reach a point where there is no possible longest timelike curve from SS. However, recall from the start of this chapter that if SS is a Cauchy surface, there
must be a longest timelike curve to a future point. This contradiction has only one resolution: it must simply not be possible to get more than 3//c_(0)3 / c_{0} away from SS by following a timelike curve.
50.3 Our Universe
After the lengthy and abstract argument of the last section, we are ready to apply the result to our Universe, which is likely very close to one of the Robertson-Walker spaces filled with perfect fluid, and hence is welldescribed by a Friedmann model.
Consider a small comoving volume of space of 3 -volume V\mathcal{V} in a Friedmann model. Dimensionally speaking, the convergence cc must scale as -V^(˙)//V-\dot{\mathcal{V}} / \mathcal{V}. Since the volume V\mathcal{V} itself scales varies as a(t)^(3)a(t)^{3}, where a(t)a(t) is the expansion factor, we conclude that the convergence of the Universe can be taken to be
Since a^(˙)(t)\dot{a}(t) and a(t)a(t) are positive we see that, as a consequence of the expansion of the Universe, the world lines of dust particles in our Universe are diverging, rather than converging, as we assumed in the argument above. Rather disappointingly, therefore, we must conclude that the argument is inapplicable. Before losing hope, we note that if we play time backwards then the sign of a^(˙)\dot{a} flips, while aa does not, and so we can apply then the argument to the past of our Universe, even if it doesn't seem to work for the future.
So now we take SS to be the current spacelike hypersurface of the Universe. Following the argument above, it follows that it is not possible to trace a timelike curve backwards in time further than an interval in proper time of tau=3//c_(0)\tau=3 / c_{0}. Why has this happened? We have defined our theory of cosmology to take place on the smooth manifold that describes the theory of fields. As a result, the only point that could be reached by following a timelike curve back in time for an interval of 3//c_(0)3 / c_{0} is one that does not live in the manifold at all. This point is the singularity for which we have been searching.
The key here is that a singularity cannot conform to the demand that the manifold is smooth everywhere. It is a point that we must cut out of the spacetime if we are to describe the spacetime manifold using a field theory like relativity (Fig. 50.4). If we follow a timelike curve backwards and we reach a gap in the manifold then the curve cannot continue. This then is our initial Big-Bang singularity. In a Universe whose character is well approximated by the state of affairs that we have described in this chapter, a singular point in the past is therefore an inevitability. We conclude that the Universe starts (and this book ends) not with a whimper, but with a Big Bang.
Fig. 50.4 We are forced to cut out a singular point from a manifold, which must be a smooth space.
Chapter summary
A Cauchy surface SS has a longest timelike curve from SS to a point pp on the domain of dependence.
If a point pp is further from SS than tau=3//c_(0)\tau=3 / c_{0} along any timelike curve then there is no timelike curve from SS to pp.
A singularity in the spacetime manifold is the reason we cannot follow a timelike curve further back than tau=3//c_(0)\tau=3 / c_{0} in a RobertsonWalker Universe.
An initial, Big-Bang singularity is therefore expected for a Robertson-Walker Universe on general grounds.
Exercises
(50.1) The (0,2)(0,2) projection tensor is given by
where u\boldsymbol{u} is a velocity vector.
(a) Write the tensor in component form.
(b) Show that if we insert a vector v\boldsymbol{v} into P\boldsymbol{P} we project vv into the 3 -surface that is orthogonal to u\boldsymbol{u}.
(c) Evaluate P^(mu nu)P_(mu nu)P^{\mu \nu} P_{\mu \nu}.
(d) Evaluate the P^(alpha beta)u_(alpha;beta)P^{\alpha \beta} u_{\alpha ; \beta} for the case that u\boldsymbol{u} is tangent to a geodesic.
(e) If n\boldsymbol{n} is a unit spacelike vector, show that P=g()-, tilde(n)()ox tilde(n)()\boldsymbol{P}=\boldsymbol{g}()-,\tilde{\boldsymbol{n}}() \otimes \tilde{\boldsymbol{n}}() is the corresponding projection operator.
(50.2) The conventional derivation of the Raychaudhuri equation uses a very similar argument to the one in Section 50.2. We follow the approach of Zee here, which can be referred to for further details. Consider a congruence of timelike geodesics, parametrized by proper time tau\tau, with coordinates x^(mu)(tau,sigma^(1),sigma^(2),sigma^(3))x^{\mu}\left(\tau, \sigma^{1}, \sigma^{2}, \sigma^{3}\right) and tangent vectors u=\boldsymbol{u}=(delx^(mu)//del tau)e_(mu)\left(\partial x^{\mu} / \partial \tau\right) \boldsymbol{e}_{\mu}. The vectors W_(i)=(delx^(mu)//delsigma^(i))e_(mu)\boldsymbol{W}_{i}=\left(\partial x^{\mu} / \partial \sigma^{i}\right) \boldsymbol{e}_{\mu} span a three-dimensional subspace which can be projected into using the operator P\boldsymbol{P} from the previous question.
(a) Show that the rate of change of W_(i)\boldsymbol{W}_{i} along the congruence is given by
where B_(mu nu)=u_(mu;nu)B_{\mu \nu}=u_{\mu ; \nu}. The tensor B\boldsymbol{B} therefore measures the failure of the vector W_(i)\boldsymbol{W}_{i} to be parallel transported along the congruence.
(b) Show that B\boldsymbol{B} has the properties that u^(mu)B_(mu nu)=u^{\mu} B_{\mu \nu}= 0 and B_(mu nu)u^(nu)=0B_{\mu \nu} u^{\nu}=0, telling us that B\boldsymbol{B} also lives in the same three-space as W_(i)\boldsymbol{W}_{i}.
The tensor B\boldsymbol{B} is conventionally split into: (i) a trace
which is known as the Raychaudhuri equation and is very useful in proving singularity theorems. Often the expansion parameter theta\theta is useful to tell us if a congruence is expanding or contracting. Since Dtheta//dtau=g^(mu nu)(grad_(u)B)_(mu nu)\mathrm{D} \theta / \mathrm{d} \tau=g^{\mu \nu}\left(\boldsymbol{\nabla}_{\mathbf{u}} \boldsymbol{B}\right)_{\mu \nu}, we can contract the result
from (c) using g^(mu nu)g^{\mu \nu}.
(d) Show that
for all timelike vectors v\boldsymbol{v}, then all of the geodesics approach each other, i.e. gravitation has a focussing effect on the congruence.
A
Further reading
Books must follow sciences, and not sciences books
Francis Bacon (1561-1626)
In my situation as Chancellor of the University of Oxford, I
have been much exposed to authors
Arthur Wellesley, Duke of Wellington (1769-1852)
There are many excellent books on general relativity, cosmology, geometry and related fields. ^(1){ }^{1} Like many introductory books this one contains a compilation of many arguments, explanations and examples formulated and presented by other authors. Our sources are discussed at the end of this appendix.
In learning most subjects, one usually benefits from having read >= 2\geq 2 books and many of those mentioned in this chapter would provide a good supplement to this one. We like Geroch and Spivak's insightful explanations, Penrose (2004) and Hartle's approachability and Wald's precision. For learning general relativity, some recommended choices that share a similar approach to this book include: (i) Geroch (1978), Penrose (2004), Schutz (1985) and Hartle at an introductory level; (ii) d'Inverno, Guidry, Hobson/Efstathiou/Lasenby, and Zee at an intermediate level; and (iii) Geroch (2013, General Relativity), Hawking/Ellis, Landau/Lifshitz (1975), Misner/Thorne/Wheeler and Wald at an advanced level. For learning differential geometry we recommend: (i) Penrose (2004) and Schutz (1980) at an introductory level; Misner/Thorne/Wheeler at an intermediate level; and (iii) Geroch (1985 and 2013, Differential Geometry) and Spivak (2005) at a more advanced level (with the latter very suitable for readers with a background in mathematics). The problem books by Moore (at an elementary level) and by Lightman et al. and Blennow/Ohlsson (intermediate/advanced) are also warmly recommended. We have followed their approaches in some examples and exercises.
One thing to beware of in books on general relativity is the different sign conventions adopted. These change a number of the key equations. There are four conventions to watch out for (three of which are independent). We list the conventions of several books below.
The sign s_(1)s_{1} in front of the line element of the metric ^(2)g{ }^{2} g
Here are some of the sign conventions used in well-known books. ^(3){ }^{3} ^(3){ }^{3} References are given at the end of this appendix.
Further reading by chapter:
Most of the topics covered in this book are also discussed in the standard references on general relativity. The further reading list given below is based on books that use a similar approach to us, and those whose presentation we've followed in some of our arguments. Some are at an introductory and some at a more advanced level.
Chapter 1: an accessible introduction to special relativity can be found in French and in Geroch (1978); a useful summary is given in Landau and Lifshitz (vol. II). Links between geometry and special relativity are discussed in Ellis/Williams. Chapter 2: vectors are discussed in Boas and in Penrose (2004); their use in relativity is covered in Hartle and in Schutz (1985). Chapter 3: coordinate transformations are introduced in French and in Schutz (1985). Chapter 4: 1-forms are introduced in Schutz (1980 and 1985), Misner/Thorne/Wheeler and in Guidry. Ludvigsen gives a geometrical introduction to the energy-momentum tensor. Chapter 5: metrics are introduced in Hartle and Zee. Chapter 6: the principles of relativity are discussed in all books on relativity, and in most detail by Weinberg (1972). A historical account can be found in Pais. See Einstein for a collection of the original papers, these are put in context by Cheng. Chapter %\% : the covariant derivative and connection coefficients are discussed by Schutz (1985) and in Misner/Thorne/Wheeler. Chapters 8 and 9: the method used to extract connection coefficients can be found in Zee and in Hartle. Chapter 10: the importance of the vielbein is stressed in Hartle, whose approach and notation we follow. They are used extensively in Lightman et al. Chapter 11: an introduction to Riemann curvature is found in Hartle and in Misner/Thorne/Wheeler. Chapter 12: an intuitive introduction to the energy-momentum tensor is given in Hartle. Misner/Thorne/Wheeler provides lots of insight and useful diagrams. Chapter 13: the construction of the Einstein equation is justified in Schutz (1985). See Feynman (1995) for a rather different approach. Einstein's route to the field equation is described is Pais. Chapter 15: Cosmology is introduced in Lambourne. For a full account see Peacock and Weinberg (2006). Chapter 16: Robertson Walker spaces are introduced in Lambourne and described in detail in Misner/Thorne/Wheeler. More detail on hyperbolic spaces is covered in Penrose (2004) and in Needham (1997). Chapters 17 and 18: cosmological models are introduced in Penrose (2004) and, more systematically, in Lambourne. Chapter 19: conformal infinities and singularities are outlined in d'Inverno. Singularity theory is described at a more advanced level in Hawking/Ellis (whose presentation we follow in a highly simplified form) and in Penrose (1972). Chapter 20: Newtonian orbits are analysed in French and Ebbison. An advanced (but fascinating!) take is Gutzwiller. Chapter 21: the Schwarzschild geometry is introduced in Hartle, in Schutz (1985) and in Lambourne. Misner/Thorne/Wheeler gives a complete account. Chapters 22, 23 and 24: motion in the Schwarzschild geometry is discussed in Hartle (whose approach and notation we follow), in Moore and in Misner/Thorne/Wheeler. Chapter 25:
black holes are introduced in Blundell, and treated in all modern general relativity texts. See Hartle and Schutz (1985) for introductory treatments and Misner/Thorne/Wheeler for a more advanced discussion. Chandrasekhar gives a complete account (including a clear discussion of much of the material in this part of the book), albeit at a very advanced level. Chapters 26 and 27: black hole singularities are clearly explained in Wald, and we follow this approach. The analogy with accelerating Minkowski coordinates is discussed in Rindler. Wormholes are discussed in Misner/Thorne/Wheeler. Chapter 28: Hawking radiation is explained in Schutz (1985) and in Zee. For black hole thermodynamics, see Page (2005) and Carlip (2014). Chapter 29: charged and rotating black holes are introduced in Hartle. Our discussion of the Kerr metric follows Schutz (1985). Hawking/Ellis supplies additional insight. Chapter 30: classical curvature is introduced from a visual perspective in the wonderful book by Needham (2021), in Zee and (from a historical perspective) in Weinberg (1972). A full account is given in Lipschutz. Spivak (1999) gives translations of the key papers by Gauss and Riemann, along with Spivak's characteristically insightful commentary. Chapters 31, 32 and 37: modern geometry is introduced in Needham (2021), in Misner/Thorne/Wheeler and in Schutz (1980). We follow Misner/Thorne/Wheeler's presentation and notation in this part of the book. An introduction to the formal mathematics underlying this subject is given in Spivak (1971). See Spivak (2005) for the full story on all of the topics in this section. Chapter 33: an accessible introduction to the Lie derivative is found in Penrose (2004). The discussion in Schutz (1980) is also very accessible at an intermediate level. Chapters 34 and 35: the geometrical approach to the covariant derivative and the Riemann tensor is discussed in Penrose (2004) and Needham (2021) at an introductory level, and in Misner/Thorne/Wheeler at an advanced level. Spivak (2005) fills in the mathematical details. Hawking/Ellis provides lots of insight. Chapter 36: Cartan's method is explained in Needham (2021), and in more detail (with examples) in Misner/Thorne/Wheeler and in Nakahara. Some more applications can be found in Lightman et al. Chapter 38: chains are introduced very clearly in Ryder (1985). For the full story see Spivak (1971 and 2005). Chapter 39: a full account of fluid mechanics is given in Landau/Lifshitz (vol. VI). A introduction can be found in Feynman/Leighton/Sands (vol. II) and in Thorne/Blandford. Chapter 40: quantum field theory is described in Lancaster/Blundell. Some advanced topics are covered in Wald and in Padmanabhan. Chapter 41: inflation is discussed in Peacock. A more advanced discussion can be found in Padmanabhan. Fine tuning is considered in Lewis and Barnes. Chapter 42: the geometrical interpretation of electromagnetism is well described in Misner/Thorne/Wheeler. It's treatment as a field theory is discussed in Lancaster/Blundell. Chapter 43: the geometric view of the Bianchi identity is covered at an introductory level in Ryder (1985). See Misner/Thorne/Wheeler, whose approach we follow, for the full story. Chapter 44: gauge theory is discussed in Lancaster/Blundell and in Ryder (1985), whose approach we
follow. A nice introduction is given in Penrose (2004). Chapter 45: the weak-field limit is discussed at an introductory level in Schutz (1985) and in more detail in Misner/Thorne/Wheeler. Feynman (1995) has a characteristically interesting take, as does Geroch (2013, General Relativity). We follow Ryder's (2009) discussion of the Lense-Thirring effect in the problems. Chapter 46: gravitational waves are introduced clearly in Schutz (1985), whose approach we follow. A complete and modern treatment can be found in Thorne/Blandford. Chapter 47: the properties of gravitons in a quantum field theory are discussed in similar terms in Feynman (1995). Chapter 48: Kaluza-Klein theory is introduced in Zee, whose discussion we follow. Chapter 49: string theory is introduced in Zwiebach. Loop quantum gravity is described in Rovelli/Vidotto. A short history of the latter field is given in the review by Ashketar. We follow Zee's discussion of particles in the AdS spacetime. Chapter 50: the argument we discuss is given in more detail in Geroch (2013, General Relativity). More detail on the methods can be found in Penrose (1973), Wald and also in Hawking/Ellis. Appendix C: good books on topological spaces include all of the lecture note volumes by Geroch (his course on Topology is the simplest), with more mathematical treatments available in Nakahara and in the book by Nash and Sen. See Penrose (2004) for a basic introduction to this material and Spivak for the full story. Geroch's book Mathematical Physics takes things further for the physicist. Appendix D: we follow Zee and Hartle's very clear discussions of embedding.
Bibliography
V. I. Arnold, Mathematical Methods of Classical Mechanics, 2nd edition, Springer, New York (1989).
A. Ashketar, Quantum Gravity, arXiv:gr-qc/0410054v2 (2004).
M. Blennow and T. Ohlsson, 300 Problems in Special and General Relativity, CUP, Cambridge (2022).
K. M. Blundell, Black Holes, a Very Short Introduction, OUP, Oxford (2015).
M. L. Boas, Mathematical Methods in the Physical Sciences, 2nd edition, Wiley, New York (1983).
C. G. Böhmer, Introduction to General Relativity and Cosmology, World Scientific, London (2016).
H. R. Brown, Physical Relativity, OUP, Oxford (2006).
S. Carlip, Int. J. Mod. Phys. D 23, 1430023 (2014) [arXiv:1410.1486].
S. Carlip, General Relativity, a Concise Introduction, OUP, Oxford (2019).
S. Carroll, Spacetime and Geometry: An Introduction to General Relativity, CUP, Cambridge (2019).
S. Chandrasekhar, The Mathematical Theory of Black Holes, OUP, Oxford (1992).
Y. Choquet-Bruhat, C. DeWitt-Morette, and M. Dillard-Bleick, Analysis, Manifolds and Physics, North-Holland, Amsterdam (1977).
Y. Choquet-Bruhat, Introduction to General Relativity, Black Holes and Cosmology, OUP, Oxford (2015).
S. Coleman, Sidney Coleman's Lectures on Relativity, CUP, Cambridge, (2022).
R. d'Inverno, Introduction to Einstein's Relativity, OUP, Oxford (1992).
A. Einstein, The Principle of Relativity, Dover, New York (1952).
G. F. R. Ellis and R. M. Williams, Flat and Curved Space-Times, (2nd edition), OUP, Oxford (2000).
R. P. Feynman, Feynman Lectures on Gravitation, Penguin, London (1995).
R. P. Feynman, R. B. Leighton, and M. Sands, The Feynman Lectures on Physics, Vol. II, Pearson Addison Wesley, San Francisco (2006)
J. Foster and D. J. Nightingale, A Short Course in General Relativity, 3rd edition, Springer, New York (2010).
T. Frankel, The Geometry of Physics, 2nd edition, CUP, Cambridge (2004).
A. P. French, Special Relativity, Chapman and Hall, London (1968).
A. P. French and M. G. Ebbison, Introduction to Classical Mechanics, Chapman and Hall, London (1986)
R. Geroch, General Relativity from AA to BB, University of Chicago Press, Chicago (1978).
R. Geroch, Differential Geometry, 1972 Lecture Notes, Minkowski Institute Press, Montreal (2013)
R. Geroch, General Relativity, 1972 Lecture Notes, Minkowski Institute Press, Montreal (2013).
R. Geroch, Geometrical Quantum Mechanics, 1974 Lecture Notes, Minkowski Institute Press Montreal (2013).
R. Geroch, Topology, 1978 Lecture Notes, Minkowski Institute Press, Montreal (2013).
R. Geroch, Mathematical Physics, Chicago University Press, Chicago (1985).
N. Gray, A Student's Guide to General Relativity, CUP, Cambridge (2019).
O/\emptyset. Grøn and S. Hervik, Einstein's General Theory of Relativity, Springer, New York (2007).
M. Guidry, Modern General Relativity, CUP, Cambridge (2019).
M. C. Gutzwiller, Chaos in Classical and Quantum Mechanics, SpringerVerlag, New York (1990).
J. B. Hartle, Gravity: an Introduction to Einstein's General Relativity, Pearson, Harlow (2014)
S. W. Hawking and G. F. R. Ellis, The Large Scale Structure of Space-time, CUP, Cambridge (1973).
M. P. Hobson, G. Efstathiou, and A. N. Lasenby, General Relativity, CUP, Cambridge (2006).
L. P. Hughston and K. P. Tod, An Introduction to General Relativity, CUP, Cambridge (1990).
R. J. A. Lambourne, Relativity, Gravitation and Cosmology, CUP, Cambridge (2010).
T. Lancaster and S. J. Blundell, Quantum Field Theory for the Gifted Amateur, OUP, Oxford (2014)
L. D. Landau and E. M. Lifshitz, Mechanics (volume I of Landau and Lifshitz), Pergamon, Oxford (1976).
L. D. Landau and E. M. Lifshitz, Classical Theory of Fields (volume II of Landau and Lifshitz), Pergamon, Oxford (1975).
L. D. Landau and E. M. Lifshitz, Fluid Mechanics (volume VI of Landau and Lifshitz), Pergamon, Oxford (1987).
G. F. Lewis and L. A. Barnes, A Fortunate Universe, Cambridge University Press, Cambridge (2016).
A. P. Lightman, W. H. Press, R. H. Price, and S. A. Teukolsky, Problem Book in Relativity and Gravitation, Princeton University Press, Princeton (1975).
S. Lipschutz, Schaum's Outline of Differential Geometry, McGraw-Hill, New York (1969).
M. Ludvigsen, General Relativity, CUP, Cambridge (1999).
M. Maggiore, Gravitational Waves, OUP, Oxford (2007).
C. W. Misner, K. S. Thorne, and J. A. Wheeler, Gravitation, W. H. Freeman and company, New York (1973).
T. A. Moore, A General Relativity Workbook, University Science Books, Mill Valley, CA (2013).
V. F. Mukhanov and S. Winitzki, Introduction to Quantum Effects in Gravity, CUP, Cambridge (2007).
C. Nash and S. Sen Topology and Geometry for Physicists, Dover, New York (1983).
M. Nakahara, Geometry, Topology and Physics, Adam Hilger, Bristol (1990).
H. Năstase, String theory methods for condensed matter physics, CUP, Cambridge (2017).
T. Needham, Visual Complex Analysis, OUP, Oxford (1997).
T. Needham, Visual Differential Geometry and Forms, Princeton University Press, Princeton (2021).
H. C. Ohanian and R. Ruffini, Gravitation and Spacetime, 3rd edition, CUP, Cambridge (2013).
T. Padmanabhan, Gravitation: Foundations and Frontiers, CUP, Cambridge (2010).
D. N. Page, New. J. Phys. 7, 203 (2005).
A. Pais, Subtle Is the Lord: The Science and the Life of Albert Einstein, OUP, Oxford (2005).
J. A. Peacock, Cosmological Physics, CUP, Cambridge (1999).
R. Penrose, Techniques of Differential Topology in Relativity, SIAM, Philadelphia (1973).
R. Penrose, The Road to Reality, Vintage, London (2004).
J. Plebański and A. Krasiński, An Introduction to General Relativity and Cosmology, CUP, Cambridge (2006).
E. Poisson, A Relativist's Toolkit, CUP, Cambridge (2004).
E. Poisson and C. M. Will, Gravity, CUP, Cambridge (2014).
W. Rindler, Relativity, OUP, Oxford (2006).
C. Rovelli, General Relativity: The Essentials, CUP, Cambridge (2021).
C. Rovelli and F. Vidotto, Covariant Loop Quantum Gravity, CUP, Cambridge (2015).
L. H. Ryder, Quantum Field Theory, CUP, Cambridge (1985).
L. H. Ryder, Introduction to General Relativity, CUP, Cambridge (2009).
D. W. Sciama, The Physical Foundations of General Relativity, Doubleday & Co., New York (1969).
B. F. Schutz, A First Course in General Relativity, CUP, Cambridge (1985).
B. F. Schutz, Geometrical Methods of Mathematical Physics, CUP, Cambridge (1980).
M. Spivak, Calculus on Manifolds, Westview Press, Boulder (1971).
M. Spivak, A Comprehensive Introduction to Differential Geometry: Vol 1, 3rd Edition, Publish or Perish, Houston (2005)
M. Spivak, A Comprehensive Introduction to Differential Geometry: Vol 2, 3rd Edition, Publish or Perish, Houston (1999).
J. L. Synge, Relativity: The General Theory, North-Holland, New York (1960).
E. F. Taylor, J. A. Wheeler, and E. Bertshinger, Exploring Black Holes, 2nd Edition, available for free download from eftaylor.com/exploringblackholes (2017).
K. S. Thorne and R. D. Blandford, Modern Classical Physics, Princeton University Press, Princeton (2017).
R. M. Wald, General Relativity, University of Chicago Press, Chicago (1984).
S. Weinberg, Gravitation and Cosmology, Wiley, New York (1972).
S. Weinberg, Cosmology, CUP, Cambridge (2008).
A. Zee, Einstein Gravity in a Nutshell, Princeton University Press, Princeton (2013).
B. Zwiebach, A First Course in String Theory, 2nd edition., CUP, Cambridge (2009).
B
Conventions and notation
B. 1 Electromagnetic units 562
B. 3 Covariant derivatives 564 ^(1){ }^{1} Almost all books on classical and quantum field theories use HeavisideLorentz units, though the famous textbooks on electrodynamics by Landau and Lifshitz and by Jackson do not. ^(2){ }^{2} These units are named after the English electrical engineer O. Heaviside (1850-1925) and the Dutch physicist H. A. Lorentz (1853-1928). ^(3){ }^{3} We use passive transformations in this subject. That is to say, our transformations change the coordinates describing the position of an event, rather than the position of an event, rather than
changing the position of the event itchanging the position of the event it-
self: the latter being an active transforself: the latter being an active transfor-
mation. Sidney Coleman notes that, in mation. Sidney Coleman notes that, in criminal circles, a passive transformation is analogous to an alias (the criminal is an event, after the transformation they remain at the position of the crime in spacetime, but they look different owing to the transformation), while the active transformation is like an alibi (the criminal/event is transformed to a different position in spacetime to the position of the crime). ^(4){ }^{4} Upstairs components are sometimes called covariant components. We mostly avoid this terminology.
B. 1 Electromagnetic units
In SI units, Maxwell's equations in free space can be written as
This appendix contains a summary of some of the choices of conventions and notation we have made in the book.
Although SI units are preferable for many applications in physics, the desire to make our (admittedly often complicated) equations as simple as possible motivates a different choice of units for the discussion of electromagnetism in field theory. ^(1){ }^{1} We therefore choose the HeavisideLorentz ^(2){ }^{2} system of units (also known as the 'rationalized Gaussian CGS' system) which can be obtained from SI by setting epsilon_(0)=mu_(0)=\epsilon_{0}=\mu_{0}= 1. Thus, the electrostatic potential V( vec(x))=q//4piepsilon_(0)| vec(x)|V(\vec{x})=q / 4 \pi \epsilon_{0}|\vec{x}| of SI becomes V( vec(x))=q//4pi| vec(x)|V(\vec{x})=q / 4 \pi|\vec{x}| in Heaviside-Lorentz units, and Maxwell's equations can be written as
Using our other choice of c=1c=1 obviously removes the factors of cc too.
B. 2 Vectors, 1-forms and tensors
In a particular basis, a vector is described by a set of components. If the basis is rotated, then the components will change, but the length of the vector will be unchanged. ^(3){ }^{3} Three-vectors (or 3 -vectors) have three spatial components [such as (A^(x),A^(y),A^(z))\left(A^{x}, A^{y}, A^{z}\right) in a Cartesian coordinate system] and denoted by a letter with an arrow on top, such as vec(A)\vec{A} or vec(p)\vec{p}. The components of 3 -vectors are listed with a Roman index taken from the middle of the alphabet: e.g. A^(i)A^{i}, with i=1,2,3i=1,2,3 so that we can write components A^(i)=(A^(1),A^(2),A^(3))A^{i}=\left(A^{1}, A^{2}, A^{3}\right). We sometimes use the names of coordinates for the components: e.g. A^(i)=(A^(x),A^(y),A^(z))A^{i}=\left(A^{x}, A^{y}, A^{z}\right). Component labels for vectors are always written in the upstairs position ^(4){ }^{4} (e.g. {:A^(i))\left.A^{i}\right) and never downstairs (A_(i))\left(A_{i}\right).
In most applications, we deal with (3+1)(3+1)-dimensional spacetime. A four-vector (or 4 -vector) that lives in this spacetime is a vector-valued object with a single timelike component and three spacelike components, which themselves form a three-vector. Four-vectors are displayed in bold script (e.g. v). All bold-script quantities are coordinate free, existing independently of a specific basis. When referred to a basis, four-vector components are given a Greek index: for example, v^(mu)v^{\mu} where mu=0,1,2,3\mu=0,1,2,3. We write v^(mu)=(v^(0),v^(1),v^(2),v^(3))v^{\mu}=\left(v^{0}, v^{1}, v^{2}, v^{3}\right) or (v^(0),v^(i))\left(v^{0}, v^{i}\right) or (v^(0),( vec(v)))\left(v^{0}, \vec{v}\right). The zeroth component, v^(0)v^{0}, is the timelike part. Basis vectors are written as e_(mu)\boldsymbol{e}_{\mu} (and sometimes {: del//delx^(mu))\left.\partial / \partial x^{\mu}\right), so we can write a vector in terms of its components as v=v^(mu)e_(mu)\boldsymbol{v}=v^{\mu} \boldsymbol{e}_{\mu}, with a bold part on both sides of the equality. ^(5){ }^{5}
The vector's natural partner is the 1-form. These are written in frameindependent form using bold type with a tilde, e.g. tilde(A)\tilde{A}. Like vectors they can be split into components and basis 1 -forms, the latter written as omega^(mu)\boldsymbol{\omega}^{\mu} (and sometimes dx^(mu)\boldsymbol{d} x^{\mu} ). In terms of components and basis 1 -forms, we write tilde(A)=A_(mu)omega^(mu)\tilde{\boldsymbol{A}}=A_{\mu} \boldsymbol{\omega}^{\mu}. Components of 1-forms always have the index written in the down position. ^(6){ }^{6} An example of a familiar 1 -form is the gradient of a function f(x^(mu))f\left(x^{\mu}\right), whose components are the derivatives (del f)/(delx^(mu))\frac{\partial f}{\partial x^{\mu}}, which is sometimes written as del_(mu)f\partial_{\mu} f and sometimes written using the comma notation such that (del f)/(delx^(mu))=f_(,mu)\frac{\partial f}{\partial x^{\mu}}=f_{, \mu}.
We use the Einstein convention that all indices repeated in both an up and down position are summed over. Inner products between 1 -forms and tensors are written with angle brackets: (: tilde(A),v:)=A_(mu)v^(mu)\langle\tilde{\boldsymbol{A}}, \boldsymbol{v}\rangle=A_{\mu} v^{\mu}. Dot products (or, equivalently, scalar products) between two vectors are written as v*u=g_(mu nu)v^(mu)u^(nu)\boldsymbol{v} \cdot \boldsymbol{u}=g_{\mu \nu} v^{\mu} u^{\nu}, where g_(mu nu)g_{\mu \nu} are the components of the metric.
Tensors are treated as slot machines and given bold symbols like T(\boldsymbol{T}(,).) . Their valence is specified separately in the form (n,m)(n, m), meaning nn slots for 1 -forms and mm slots for vectors. ^(7){ }^{7} When the slots are filled, the tensor outputs a number. Components can be extracted using the basis vectors e_(mu)\boldsymbol{e}_{\mu} and basis 1-forms omega^(nu)\boldsymbol{\omega}^{\nu} via equations such as the following for a (2,2) tensor: S(omega^(mu),omega^(nu),e_(alpha),e_(beta))=S^(mu nu)_(alpha beta)\boldsymbol{S}\left(\boldsymbol{\omega}^{\mu}, \boldsymbol{\omega}^{\nu}, \boldsymbol{e}_{\alpha}, \boldsymbol{e}_{\beta}\right)=S^{\mu \nu}{ }_{\alpha \beta}. Tensors can be combined using outer products denoted ox\otimes, or wedge products denoted ^^\wedge, with the relationship v^^u=v ox u-u ox v\boldsymbol{v} \wedge \boldsymbol{u}=\boldsymbol{v} \otimes \boldsymbol{u}-\boldsymbol{u} \otimes \boldsymbol{v}. Tensors can also be written in terms of their components using this notation
The trace of a tensor is denoted by an italic letter, ^(8){ }^{8} e.g. T=T^(mu)_(mu)T=T^{\mu}{ }_{\mu}. Tensor components are sometimes denoted with the coordinates (e.g. mu=t,r,theta,phi)\mu=t, r, \theta, \phi) and sometimes, equivalently, numbers (e.g. mu=1dots4\mu=1 \ldots 4 ). Using the latter, ordered indices |mu nu||\mu \nu| are arranged such that mu < nu\mu<\nu. ^(5){ }^{5} Also in (3+1) dimensions we generally use V\mathcal{V} to denote a 4 -volume and VV for a 3 -volume. The invariant 4 -volume is usually dV\mathrm{d} \mathcal{V} and the invariant 3 -volume usually dV\mathrm{d} \mathcal{V} and the invariant 3 -volume
is dSigma\mathrm{d} \Sigma. Some other texts use dOmega\mathrm{d} \Omega for the invariant 4 -volume, but we reserve the invariant 4 -volume, but we reserve dOmega^(2)=dtheta^(2)+sin^(2)thetadphi^(2)\mathrm{d} \Omega^{2}=\mathrm{d} \theta^{2}+\sin ^{2} \theta \mathrm{~d} \phi^{2} for the angular part of the spherical line element. ^(6){ }^{6} These are sometimes called contravariant components. ^(7){ }^{7} On the few occasions we want to make an argument about a general matrix, rather than about a tensor, we denote the matrix X_\underline{\boldsymbol{X}}. ^(8){ }^{8} The most important tensor in this subject is the metric. This is a (0,2)(0,2) tensor g(\boldsymbol{g}(,)withcomponentsg_(mu nu)=) with components g_{\mu \nu}=g(e_(mu),e_(nu))=e_(mu)*e_(nu)\boldsymbol{g}\left(\boldsymbol{e}_{\mu}, \boldsymbol{e}_{\nu}\right)=\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}. In an exception to our rules, the determinant of the metric (not the trace) is denoted gg. We use the signature (-+++)(-+++) and specify the components of diagonal matrices by saying, for example, that the components of the Minkowski tensor are eta_(mu nu)=diag(-1,1,1,1)\eta_{\mu \nu}=\operatorname{diag}(-1,1,1,1). Indices are raised and lowered with the components of the metric. ^(9){ }^{9} If, as in Chapter 30, we do write points on the world line in terms of a set of displacement vectors X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu} then we could write the tangent as
so that the components are u^(mu)=(delx^(mu))/(del tau)u^{\mu}=\frac{\partial x^{\mu}}{\partial \tau} and the basis vectors on the curve are e_(mu)=(del X(tau))/(delx^(mu))\boldsymbol{e}_{\mu}=\frac{\partial \boldsymbol{X}(\tau)}{\partial x^{\mu}}. However, the displace ment vector does not transform according to the tensor transformation law so is not useful for general relativity. The modern way of looking at vectors (Chapter 31) is not to invoke the displacement vectors and instead specify the tangent field as
so that the basis vectors are written as e_(mu)=(del)/(delx^(mu))\boldsymbol{e}_{\mu}=\frac{\partial}{\partial x^{\mu}}.
Tensor fields are functions of position. We write a vector field v(x)\boldsymbol{v}(x), meaning that at a point xx we output a vector v\boldsymbol{v}. The point here could be an abstract point in a manifold P\mathcal{P} or the coordinates of this point x^(mu)(P)x^{\mu}(\mathcal{P}). This is intended to prevent any confusion with the slots carried by the tensor (e.g. the single slot of a vector field that takes a 1 -form).
Position vectors, interpreted as pointing between points in spacetime, are not very useful in curved spacetime. Instead, we usually specify a general point in spacetime P\mathcal{P} or its coordinate x^(mu)(P)x^{\mu}(\mathcal{P}), which are not treated as the components of a vector. ^(9){ }^{9} The most important vector field in relativity is the velocity, which provides the tangents to a world line x^(mu)(tau)x^{\mu}(\tau), which is a curve parametrized by an affine parameter such as the proper time tau\tau. The velocity field is given by u(x)=((dx^(mu)(tau))/(dtau))e_(mu)\boldsymbol{u}(x)=\left(\frac{\mathrm{d} x^{\mu}(\tau)}{\mathrm{d} \tau}\right) \boldsymbol{e}_{\mu}, with the property u*u=-1\boldsymbol{u} \cdot \boldsymbol{u}=-1.
In the orthonormal frame, we write components with a hat. So a vector is written as A=A^( hat(alpha))e_( hat(alpha))\boldsymbol{A}=A^{\hat{\alpha}} \boldsymbol{e}_{\hat{\alpha}}. Indices in an orthonormal frame are raised and lowered with the Minkowski metric with components eta_(mu nu)=diag(-1,1,1,1)\eta_{\mu \nu}=\operatorname{diag}(-1,1,1,1). To translate between the orthonormal frame and a coordinate frame, we use the components of a vielbein, written using brackets in expressions such as
For a diagonal metric, with non-zero components g_(mu mu)g_{\mu \mu}, we have the useful square-root rule (e_(mu))^( hat(mu))=sqrt(|g_(mu mu)|)\left(e_{\mu}\right)^{\hat{\mu}}=\sqrt{\left|g_{\mu \mu}\right|}, where no summation is implied.
B. 3 Covariant derivatives
The most useful derivative in relativity is the covariant derivative, which is written in frame-independent form as grad_(u)\boldsymbol{\nabla}_{\boldsymbol{u}}, which is equivalent to grad_(u)=\boldsymbol{\nabla}_{\boldsymbol{u}}=u*grad_(e_(mu))\boldsymbol{u} \cdot \boldsymbol{\nabla}_{\boldsymbol{e}_{\mu}}, where u\boldsymbol{u} is a vector. This is a directional derivative, taken along the direction of the vector u\boldsymbol{u}. When the direction is given in terms of a basis vector we write grad_(e_(mu))=grad_(mu)\nabla_{e_{\mu}}=\nabla_{\mu}. Confusingly, this is not a component expression: the mu\mu subscript is short for e_(mu)\boldsymbol{e}_{\mu}, where mu\mu labels the direction along which the derivative is taken. In terms of components and a basis, the covariant derivative can be written as
We also use a further notation for the covariant derivative made along a curve x^(mu)(tau)x^{\mu}(\tau) with tangent u(x)\boldsymbol{u}(x), which we write as Dv//dtau=grad_(u)v\mathrm{D} \boldsymbol{v} / \mathrm{d} \tau=\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{v} and
where the velocity u\boldsymbol{u} is tangent to the curve parametrized by tau.^(10)\tau .{ }^{10}
Manifolds and bundles
Life is a short affair; We should try to make it smooth, and free from strife.
Euripides (c. 480-480- c. 406 )
Given a shape or a space to understand, such as a two-dimensional spherical surface, we are usually tempted to embed it in a higher dimensional Euclidean space (e.g. three-dimensional space in this case) in order to examine its structure. However, in studying the spacetimes of general relativity, it is certainly not given that the spacetime of our Universe actually lives in some higher dimensional Euclidean space. We must therefore come to terms with a more intrinsic, geometrical description of the fabric of spacetime in terms of a manifold, without relying on the artifice of embedding it in a higher dimensional space. A manifold has only a little mathematical structure of its own. We only insist that a manifold be a smooth space without awkward discontinuities or unusual joints. Manifolds fit naturally with physical models described in terms of classical field theory which rely on this notion of smoothness. Thus, when we examine singularities, we can characterize them as places where the smooth manifold description breaks down. ^(1){ }^{1}
A good working definition of a manifold is a space that looks locally flat and Euclidean. This constrains the space to change smoothly, since a space full of discontinuities cannot look Euclidean at each of its points. Coordinates, functions, and curves can be defined on manifolds. We can perform calculus on manifolds, use them to define vectors and, if we choose, define metrics on them in order to work out lengths and angles. Nature, as explained by general relativity, seems to be based on the metric and so these two separate ingredients, manifold and metric, form the basis of the geometrical description of Nature.
The manifold point of view is a natural one for describing the geometry of physics. We call ordinary three-dimensional space, the manifold R^(3)\mathbb{R}^{3} : it is simply the space in which 3 -vectors live. Space can then be thought of as the manifold R^(3)\mathbb{R}^{3} with a flat metric d(\boldsymbol{d}(,)definedonitthatcanbe) defined on it that can be used to measure lengths. Special relativity asserts that spacetime is the manifold R^(4)\mathbb{R}^{4} (i.e. the space of 4 -vectors) with a flat metric eta(\boldsymbol{\eta}(,)defined) defined on it. In general relativity, spacetime is a manifold, usually called M\mathcal{M}, on which a Lorentz metric g(\boldsymbol{g}(,)isdefined.Thecurvatureofthespacetime) is defined. The curvature of the spacetime is related to the matter distribution in spacetime via Einstein's equation.
In this book, we have been doing our physics on a (pseudo) Riemann manifold, which possesses a connection and a metric. In fact, the Rie-
C. 1 Preliminaries 566
C. 2 Maps and functions 567
C. 3 One-to-one, into, and onto 567
C. 4 Continuous maps 568
C. 5 Manifolds, coordinates, and charts 569 C. 6 Functions on the manifold 571
C. 7 Differentiation on the manifold
572
C. 8 Compact regions 575\mathbf{5 7 5}
C. 9 Curves 575
C. 10 Tangent spaces 576
C. 11 Fibre bundles 578
Chapter summary 580 ^(1){ }^{1} The material in this appendix lies behind the mathematics presented in this book, particularly the material on differential geometry. Although we have tried to minimize its use in the main body of the book, this mathematics provides the reason why modern general relativity looks the way it does, and is therefore used in many modern and is therefore used in many modern ple, the study of singularities used to understand the structure of black holes and cosmology is especially reliant on the use of many of the ideas introduced here.
Fig. C. 1 The Riemann manifold (M,g)(\mathcal{M}, \boldsymbol{g}), with its metric structure, exists at the top of a pyramid of concepts in mathematics. ^(2){ }^{2} For those not inclined to venture any further at this stage, here's an executive summary of the content of this appendix: general relativity takes place mathematical structure that has this mathematical structure that has this smoothness is a manifold. On a manifold, instead of a coordinate transformation we have the diffeomorphism and instead of the idea of a boundary we have the notion of compactness. Tangent vectors live in a manifold called a tangent space. The combination of a manifold and a tangent space is known as a fibre bundle.
(a)
(b)
(c)
(d)
Fig. C. 2 (a) Open interval; (b) closed interval; (c) the union of AA and BB; (d) the intersection of AA and BB; (e) AA is a subset of BB; (f) an open ball in R^(2)\mathbb{R}^{2}; a subset of B;(f)B ;(\mathrm{f}) an open , which is a (g)(\mathrm{g}) an open cover of a set AA, whin (g) an open cover of a set AA, which is a
subset of R^(2);(h)\mathbb{R}^{2} ;(\mathrm{h}) a non-Hausdorff space.
mann manifold is built upon a series of concepts, as shown in Fig. C.1. In this appendix, we take a step back, forgetting many of the mathematical notions we take for granted, such as the distances and times encoded in the metric. We shall deal with manifolds from the primitive point of view that everything needs to be built from scratch. Taking as little baggage as possible with us, we shall attempt to build a set of concepts suitable to describe the geometrical fabric of the Universe. This topic starts with simple notions of sets and intervals and then introduces the study of manifolds and their structure. We finish by describing some simple concepts of fibres and bundles that lie behind the tangent spaces of differential geometry and the physics of gauges. ^(2){ }^{2}
C. 1 Preliminaries
A space with a metric defined on it is called a metric space. The metric allows us to work out how far points are from other points. We call a space without a metric a topological space. This has less structure, but we tend to describe points in the neighbourhood of other points in terms of parts of the space known as open subsets, which encode its topology. We introduce some of the relevant ideas here and in Fig. C.2. We start with some primitive notions of sets (or collections of objects or points) and intervals (i.e. a subset of points between two end points, or the neighbourhood of points between two end points). A manifold then turns out to be a set with some special properties. Let's start with some definitions:
The open interval a < x < ba<x<b, not including the endpoints aa and bb is written ( a,ba, b ) [Fig. C.2(a)]. The closed interval a <= x <= ba \leq x \leq b, which includes the endpoints aa and bb is written [a,b][a, b] [Fig. C.2(b)].
The expression p in Ap \in A denotes that pp is an element of the set AA.
The expression A uu BA \cup B denotes the union of sets AA and BB : the set of objects belonging to A,BA, B or both [Fig. C.2(c)].
The expression A nn BA \cap B denotes the intersection of sets AA and BB : the set of objects belonging to both AA and BB [Fig. C.2(d)].
The expression A sub BA \subset B denotes that AA is a subset of BB [Fig. C.2(e)].
The expression O/\varnothing denotes the empty set, containing no elements.
R\mathbb{R} is the set of real numbers.
We define R^(n)\mathbb{R}^{n} to be the nn-dimensional Euclidean space we usually use for vector algebra. A point in R^(n)\mathbb{R}^{n} is a sequence of real numbers (x^(1),x^(2),x^(3),dotsx^(n))\left(x^{1}, x^{2}, x^{3}, \ldots x^{n}\right), sometimes called an nn-tuple. The space R^(n)\mathbb{R}^{n} is a metric space: with the distance between points is given by
An open ball in R^(n)\mathbb{R}^{n} of radius rr centred around a point yy consists of the points xx such that |x-y| < r|x-y|<r. This is an example of an open subset of the set R^(n)\mathbb{R}^{n} [Fig. C.2(f)]. It is some region, usually assumed close to yy, that doesn't include its boundary.
A set of points SS of R^(n)\mathbb{R}^{n} is open if every point in SS has an open neighbourhood entirely within SS. Such a set can be expressed as a union of open balls. Any reasonable chunk of R^(n)\mathbb{R}^{n} is open if we don't include its boundary in the set. A collection OO of open sets is an open cover of a set AA if every point in AA is in the collection OO [Fig. C.2(g)].
The Hausdorff property of a set in R^(n)\mathbb{R}^{n} is the feature that any two distinct points have neighbourhoods that don't intersect (i.e. any line can be infinitely subdivided). A non-Hausdorff space is typified by branching [Fig. C.2(h)]. We shall only ever deal with Hausdorff spaces.
With the simple notions defined, we move on to discussing how to relate one element of a space to another element in another space.
C. 2 Maps and functions
The basic tool for examining the properties of the various mathematical structures of use in physics is mapping. A map ff from space M\mathcal{M} to space N\mathcal{N} is a rule that associates with an element xx of M\mathcal{M} a unique element yy of N\mathcal{N}. The idea is shown in Fig. C.3. The simplest map is a^(3)\mathrm{a}^{3} real function. For such a function, both M\mathcal{M} and N\mathcal{N} are elements of the set R\mathbb{R} (i.e. the set of real numbers). The function ff takes an element xx and spits out an element yy. The notation saying that ff maps elements in M\mathcal{M} to elements in N\mathcal{N} is written as
{:(C.3)f:x|->y=f(x):}\begin{equation*}
f: x \mapsto y=f(x) \tag{C.3}
\end{equation*}
When a map is a real-valued function of nn variables we write f:R^(n)rarrRf: \mathbb{R}^{n} \rightarrow \mathbb{R}. This simply amounts to saying that we input a nn-tuple (x^(1),dots,x^(n))\left(x^{1}, \ldots, x^{n}\right) to the function and output a single number. ^(4){ }^{4}
We can combine different mappings. If we have two maps ff and gg, f:MrarrNf: \mathcal{M} \rightarrow \mathcal{N} and g:NrarrLg: \mathcal{N} \rightarrow \mathcal{L}, then there is a map called the composition of ff and gg denoted g@fg \circ f, which maps M\mathcal{M} to L\mathcal{L}. In ordinary algebra, g@fg \circ f would be written as g(f(x))g(f(x)).
C. 3 One-to-one, into, and onto
The points, mapped from the subset of points SS in M\mathcal{M} to points in N\mathcal{N}, form a new set TT called the image of SS under ff, or f(S)f(S). The set SS is called the inverse image, i.e. S=f^(-1)(T)S=f^{-1}(T). Using the notion of images, we can identify several sorts of mapping.
If the map is many-to-one, then the inverse image of some point of N\mathcal{N} is not a single point in M\mathcal{M} [Fig. C.4(a)].
If every point in f(S)f(S) has a unique inverse image point in SS, then ff is said to be one-to-one or 1-1 [Fig. C.4(b) and (c)].
If a mapMrarrN\operatorname{map} \mathcal{M} \rightarrow \mathcal{N} is defined for all points in M\mathcal{M} (i.e. S=MS=\mathcal{M} ), then the mapping is from M\mathcal{M} into N\mathcal{N} [Fig. C.4(b and c)]. ^(3){ }^{3} In many texts, such as this one, the terms map and function are used interchangeable.
Fig. C. 3 A function as a mapping: input an element xx, output an element y=f(x)y=f(x). ^(4){ }^{4} Of course, we would usually write this as f(x^(1),dots,x^(n))=yf\left(x^{1}, \ldots, x^{n}\right)=y.
Fig. C. 4 (a) many-to-one and (b) into mappings. (c) A bijection, which is both 1-1 and onto. ^(5){ }^{5} Other terms which are used in the mathematical literature to classify functions are:
injective == one-to-one;
surjective == onto;
bijective == both surjective and injective.
The 'sur' in 'surjective' is from the French sur meaning on. The 'bi' in bijective reminds you that bijective combines two properties.
(c)
Fig. C. 5 Functions discussed in Example C. 5 .
If every point in N\mathcal{N} has an inverse image (not necessarily a unique one), we say it is a mapping from M\mathcal{M} onto N\mathcal{N}.
A map that is both 1-1 and onto is called a bijection ^(5){ }^{5} [Fig. C.4(c)]. Only bijective maps have a unique inverse that is a map, which we denote f^(-1):Nrarr Sf^{-1}: \mathcal{N} \rightarrow S. This map is then also bijective.
Example C. 1
Figure C. 5 shows examples of functions, y=f(x)y=f(x), which can be thought of as maps from R\mathbb{R} to R\mathbb{R}, i.e. f:RrarrRf: \mathbb{R} \rightarrow \mathbb{R}
(i) Figure C.5(a) is 1-1, but not onto.
(ii) Figure C.5(b) is onto but not 1-1.
(iii) Figure C.5(c) is a bijection (i.e. both 1-1 and onto).
(iv) Figure C.5(d) is neither 1-1 nor onto.
C. 4 Continuous maps
The mathematician is often looking for ways to say that two spaces or systems look the same, or at least similar, since this constitutes a useful method of classifying the structure of a space. The way this is done in mathematical physics is via the use of morphisms. Roughly, a morphism is a type of map that preserves structure, allowing us to move between two spaces in order to compare them. Two sorts of morphisms are relevant in geometry: the homeomorphism and the diffeomorphism. The latter is the important morphism for general relativity. We shall first meet the homeomorphism which can be used, in this context, to say that two spaces share the same sort of continuous structure. The diffeomorphism is similar, but also includes the idea that the spaces are differentiable.
A map phi:MrarrN\phi: \mathcal{M} \rightarrow \mathcal{N} is continuous at point xx in M\mathcal{M} if any open set of N\mathcal{N} containing phi(x)\phi(x) contains the image of an open set of M\mathcal{M} containing xx.
A homeomorphism is a 1-1, onto map from one space to another which is continuous and whose inverse is continuous.
Two spaces with a homeomorphism between them are said to be homeomorphic. Roughly speaking, a homeomorphism preserves the topological properties of a space, so that its 'overall shape' or 'overall structure' is preserved. In this sense, a homeomorphism allows us to say that a space 'looks like' another space. If this seems abstract, then one example of a homeomorphism to keep in mind is a continuous deformation. If you imagine that objects are made from a mouldable clay then if you can deform one object into another without breaking it, punching holes in the clay, gluing disparate parts or healing up holes already there, the objects are homeomorphic. ^(6){ }^{6}
Example C. 2
Some examples of things that are homeomorphisms and things that aren't are the following. [Some terminology (in italics) is explained later in the chapter.]
I The unit disc is the interior of a unit circle. It is homeomorphic to the interior of a unit square, as the disc can be continuously deformed into the square.
II The graph of a differentiable function is homeomorphic to the domain of the function.
III A differentiable parametrization of a curve is a homeomorphism between the domain of parametrization and the curve.
IV A coffee mug and doughnut can be continuously deformed into one another (Fig. C.6). This continuous deformation is one example of a homeomorphism, so the coffee mug and doughnut are homeomorphic V\mathbf{V} The set R^(m)\mathbb{R}^{m} is not homeomorphic to R^(n)\mathbb{R}^{n} if m!=nm \neq n.
VI The Euclidean real line is not homeomorphic to the circle. (This is because the unit circle is compact, but the real line is not.)
C. 5 Manifolds, coordinates, and charts
As we have said, the symbol R^(n)\mathbb{R}^{n} represents the set of all nn-tuples of real numbers (x^(1),x^(2),x^(3),dots,x^(n))\left(x^{1}, x^{2}, x^{3}, \ldots, x^{n}\right). This is another way of saying it is the ordinary space in which vectors live. It is also known as flat, Euclidean space. We started with the idea that a manifold is a set of points that, locally, looks like R^(n)\mathbb{R}^{n}. A more precise definition is as follows:
The set M\mathcal{M} is a manifold if each point of M\mathcal{M} has an open neighbourhood that is homeomorphic to an open set of R^(n)\mathbb{R}^{n} for some nn.
If an object has some point that at no level of magnification can be made to look like the flat space of R^(n)\mathbb{R}^{n}, then it is not a manifold. Note that a manifold, on its own, does not preserve lengths, angles, or other geometric quantities.
Example C. 3
Some examples of manifolds are the following:
The mm-dimensional space R^(m)\mathbb{R}^{m} itself is a manifold. It looks locally like R^(m)\mathbb{R}^{m}, after all!
The circle S^(1)S^{1} is a manifold. It looks locally like R\mathbb{R} [see Fig. C.7(a)].
The circle S^(1)S^{1} is a manifold. It looks locally like R\mathbb{R} [see Fig. C.7(a)].
The sphere S^(2)S^{2} is a manifold, looking locally like R^(2)[\mathbb{R}^{2}[ see Fig. C.7(b)].
The sphere S^(2)S^{2} is a manifold, looking locally like R^(2)\mathbb{R}^{2} [see Fig. C.7(b)]
The torus T^(2)T^{2} is a manifold, looking locally like R^(2)\mathbb{R}^{2} [see Fig. C.7(c)].
A plane with a line jutting out of it (Fig. 31.1 in Chapter 31) is not a manifold.
The point of intersection never looks smooth at any level of magnification.
The double cone (Fig. 31.1) is not a manifold. The position where the apex of one cone touches the other never looks smooth.
A point P\mathcal{P} in an mm-dimensional manifold M\mathcal{M} exists independently of any coordinates. However, we want to be able to identify points like P\mathcal{P} on the manifold using our familiar coordinates (x^(1),dots,x^(m))\left(x^{1}, \ldots, x^{m}\right) which, to remind you, live in R^(m)\mathbb{R}^{m}. To do this, we need to map between the manifold M\mathcal{M} and R^(m)\mathbb{R}^{m}. However, that map won't necessarily be a homeomorphism
Fig. C. 6 A coffee mug can be continuously deformed into a doughnut. Note that this only works because the coffee mug has a handle. A handleless coffee mug is not homeomorphic to the doughmug is not homeomorphic to the dough-
nut because, at some stage of the defornut because, at some stage of the defor-
mation, you would need to rip a hole in mation, you would need to rip a hole in
the 'deformable clay'. Hole-making is not a continuous deformation.
Fig. C. 7 (a) A circle S^(1)S^{1} looks locally like R\mathbb{R}. Note that by 'circle' we mean the one-dimensional space that is the boundary of a disc. (b) A sphere S^(2)S^{2} looks locally like R^(2)\mathbb{R}^{2}. Note that by 'sphere' we mean the two-dimensional space that is the boundary of a ball (i.e. what is often called a 'spherical surwhat is often called a 'spherical sur-
face'). (c) A torus T^(2)T^{2} is the product face'). (c) A torus T^(2)T^{2} is the product
space S^(1)xxS^(1)S^{1} \times S^{1} obtained from two cirspace S^(1)xxS^(1)S^{1} \times S^{1} obtained from two cir-
cles (shown here as the two circles in cles (shown here as the two circles in
bold). It looks locally like R^(2)\mathbb{R}^{2}. A torus is the space describing the surface of an (edible) doughnut.
Fig. C. 8 An open set UU of the manifold M\mathcal{M} is mapped to the set phi(U)\phi(U) in R^(m)\mathbb{R}^{m}.
Fig. C. 9 Coordinate neighbourhoods, chosen to cover the unit circle manifold. (a) The homeomorphism phi_(1)\phi_{1} maps a point from the subset U_(1)U_{1} on the man ifold to a value theta\theta (b) We must ex ifold to a value theta\theta. (b) We must ex clude the point shown from U_(1)U_{1}, since this could be mapped to both 0 and theta\theta.
(c) A different subset U_(2)U_{2} excludes a different point to that excluded from U_(1)U_{1}.
since M\mathcal{M} and R^(m)\mathbb{R}^{m} might have different topologies. However, since M\mathcal{M} looks like R^(m)\mathbb{R}^{m} locally, a homeomorphism phi\phi (which we call a coordinate function) can be constructed which maps between UU and R^(m)\mathbb{R}^{m}, where UU is called a coordinate neighbourhood; this is an open set of the manifold, U subMU \subset \mathcal{M}, which contains P\mathcal{P}. In mathematical language, phi\phi : U rarrR^(m)U \rightarrow \mathbb{R}^{m}, and this setup is shown pictorially in Fig. C.8. The coordinate function phi\phi is represented by mm real functions of the point P\mathcal{P}, written as {x^(1)(P),dots,x^(m)(P)}\left\{x^{1}(\mathcal{P}), \ldots, x^{m}(\mathcal{P})\right\}. This set is also often called a coordinate, for the sake of brevity. There is lots of scope for confusion here because we generally use xx to represent the functions of P\mathcal{P}, and the coordinates themselves. A simple shorthand equation to keep in mind is that the coordinates are given by
In words: Input a point P\mathcal{P} from U inMU \in \mathcal{M} and output a point x^(mu)x^{\mu} in R^(m)\mathbb{R}^{m}. Since phi\phi is a homeomorphism, and therefore has a unique inverse, we can write things the other way round
which, in words, says that we input a coordinate x^(mu)x^{\mu} to phi^(-1)\phi^{-1} which outputs a point P\mathcal{P} on the open set UU on the manifold.
We have to focus on the coordinate neighbourhood U subMU \subset \mathcal{M}, rather than on the whole manifold, because the coordinate function phi\phi often cannot be 1-1 over the entire manifold (because M\mathcal{M} only looks like R^(m)\mathbb{R}^{m} locally). We only need to be able map the region of M\mathcal{M} close to the point P\mathcal{P} to R^(m)\mathbb{R}^{m} using our particular homeomorphism phi\phi. We can then map the manifold near other points to R^(m)\mathbb{R}^{m} using a different homeomorphism.
Example C. 4
The unit circle is a manifold. We set up a map phi_(1)\phi_{1} which takes points on the manifold to the coordinate theta\theta in R\mathbb{R}, assumed to vary between 0 and 2pi2 \pi [see Fig. C.9(a)]. However, this won't work for all points in the manifold since the point P\mathcal{P} which we describe in R\mathbb{R} as theta=0\theta=0 is also ascribed the point in R\mathbb{R} called theta=2pi\theta=2 \pi. The map phi\phi that takes points on the manifold to R\mathbb{R} is not 1-11-1 if we include this point, so we are forced to drop it completely. The subset U_(1)U_{1} of M\mathcal{M}, for which phi_(1)\phi_{1} is well defined, then encompasses all of the unit circle except this troublesome point P\mathcal{P} [see Fig. C.9(b)].
In general, we may need a collection of open sets, U_(i)U_{i}, and a collection of maps phi_(i)\phi_{i}, to complete our description. We want to be able to patch these U_(i)U_{i} s together to completely cover the manifold.
Example C. 5
Returning to the unit circle, we can come up with an alternative subset of the manifold M\mathcal{M}, called U_(2)U_{2}. This one includes the whole of the circle except a point Q!=P\mathcal{Q} \neq \mathcal{P}. As drawn in Fig. C.9(c), the missing point is the one that a map phi_(2)\phi_{2} would take to pi\pi and/or -pi-\pi. The map phi_(2)\phi_{2} does however take all other points on U_(2)U_{2} to R\mathbb{R} in the open interval -pi-\pi to pi\pi. We see that by missing Q\mathcal{Q} we do capture the point P\mathcal{P} that was not covered by U_(1)U_{1}. Taken together, U_(1)U_{1} and U_(2)U_{2} are seen to cover M\mathcal{M}.
Generally, then, the subsets U_(i)U_{i} are a family of open subsets that, taken together, cover M\mathcal{M}. The map phi_(i)\phi_{i} maps from the subset U_(i)U_{i} onto an open subset of R^(m)\mathbb{R}^{m}. The subset U_(i)U_{i} is called a coordinate neighbourhood. The pair (U_(i),phi_(i))\left(U_{i}, \phi_{i}\right) is called a chart. ^(7){ }^{7} The whole family of charts {(U_(i),phi_(i))}\left\{\left(U_{i}, \phi_{i}\right)\right\} is called an atlas.
Example C. 6
Consider two examples of spaces:
(i) mm-dimensional Euclidean space. A single chart covers all of this space.
(ii) One-dimensional space. There are two possible manifolds: the line R^(1)\mathbb{R}^{1} and the circle S^(1)S^{1}. A single chart covers the line. As we saw before, (at least) two charts are needed to cover the circle.
In having more than one set of coordinates (i.e. more than one chart), we do ask that they are compatible. Consider a manifold M\mathcal{M} with overlapping subsets UU and VV (Fig. C.10). The point P\mathcal{P} lies in the overlapping region. Homeomorphisms are defined such that phi:U rarrR^(m)\phi: U \rightarrow \mathbb{R}^{m} and psi:V rarrR^(m)\psi: V \rightarrow \mathbb{R}^{m}. We have charts (U,phi)(U, \phi) and (V,psi)(V, \psi) and write phi(P)=x^(mu)\phi(\mathcal{P})=x^{\mu} and psi(P)=y^(mu)\psi(\mathcal{P})=y^{\mu}. By defining a composite map that combines the two homeomorphisms, we are able to recover the idea of a coordinate transformation. In order to get from x^(mu)x^{\mu} to y^(mu)y^{\mu} (and motivated by the diagram in Fig. C.10), we write
In words, input a coordinate x^(mu)x^{\mu} that is taken to a point P\mathcal{P} on M\mathcal{M} and output a coordinate y^(mu)y^{\mu} that corresponds to the same point.
We are now in the position to put everything together and write down a more technical description of a manifold. Of course, there's very little here we haven't seen earlier in words.
A manifold M\mathcal{M} has the following properties
Each PinM\mathcal{P} \in \mathcal{M} lies in at least one open set U_(i)U_{i} (i.e. the {U_(i)}\left\{U_{i}\right\} cover M\mathcal{M} ).
For each ii there is a homeomorphism phi:U_(i)rarr phi(U_(i))\phi: U_{i} \rightarrow \phi\left(U_{i}\right), where phi(U_(i))\phi\left(U_{i}\right) is an open subset of R^(m)\mathbb{R}^{m}.
Where any two sets U_(i)U_{i} and U_(j)U_{j} overlap, we have a composite map phi_(i)@phi_(j)^(-1)\phi_{i} \circ \phi_{j}^{-1} which takes points in phi_(i)(U_(i)nnU_(j))subR^(m)\phi_{i}\left(U_{i} \cap U_{j}\right) \subset \mathbb{R}^{m} to points in phi_(j)(U_(i)nn:}\phi_{j}\left(U_{i} \cap\right.{:U_(j))subR^(m)\left.U_{j}\right) \subset \mathbb{R}^{m}.
C. 6 Functions on the manifold
As introduced above, a function can be thought of as a map. In addition, a function can be thought of as living on a manifold. However, we really only have access to coordinates in R^(m)\mathbb{R}^{m} and so we often want to input and output these coordinates when interacting with objects defined on the manifold. Let's consider the function f:MrarrRf: \mathcal{M} \rightarrow \mathbb{R}, that inputs some point P\mathcal{P} in the mm-dimensional manifold M\mathcal{M}, and outputs a number ^(7){ }^{7} A chart is often called a coordinate system by physicists.
Fig. C. 10 A coordinate transformation.
Fig. C. 11 A function as a composite map f@phi^(-1):phi(U)rarrR\operatorname{map} f \circ \phi^{-1}: \phi(U) \rightarrow \mathbb{R}.
Fig. C. 12 A function that maps between manifolds.
in R\mathbb{R}. The point P\mathcal{P} lies in the subset UU of M\mathcal{M}. The question then is how can we tell how ff assigns a real value to each point on M\mathcal{M}, while we only have access to coordinates in R^(m)\mathbb{R}^{m}.
The answer is shown in Fig. C.11. The homeomorphism phi\phi takes P\mathcal{P} to coordinate x^(mu)=phi(P)x^{\mu}=\phi(\mathcal{P}) and UU to coordinate neighbourhood phi(U)subR^(m)\phi(U) \subset \mathbb{R}^{m}. This means we can write a composition f@phi^(-1):phi(U)rarrRf \circ \phi^{-1}: \phi(U) \rightarrow \mathbb{R}. In words, we input a coordinate x^(mu)x^{\mu} in the region of R^(m)\mathbb{R}^{m} called phi(U)\phi(U) and output a point y^(1)y^{1} on real line R\mathbb{R}. This is all we ask of this function, which is a machine that takes multidimensional points and outputs a number. The message then is that what we usually call y=f(x^(1),x^(2),dots,x^(m))y=f\left(x^{1}, x^{2}, \ldots, x^{m}\right) should, when dealing with a function on a manifold that takes M\mathcal{M} to R\mathbb{R}, be regarded as
Another way of saying this is that f@phi^(-1)(x^(mu))f \circ \phi^{-1}\left(x^{\mu}\right) is the coordinate representation of the function.
Example C. 7
Consider a function that maps between two different mm-dimensional manifolds ff : MrarrN\mathcal{M} \rightarrow \mathcal{N} as shown in Fig. C.12. That is, it takes a point P\mathcal{P} in the mm-dimensional manifold M\mathcal{M} to a point f(P)f(\mathcal{P}) on the mm-dimensional manifold N\mathcal{N}. Take a chart (U,phi)(U, \phi) on M\mathcal{M} and a chart (V,psi)(V, \psi) on N\mathcal{N}. Take P\mathcal{P} to be in UU and f(P)f(\mathcal{P}) to be in VV. The function has a coordinate representation
that is, it's an mm-tuple-valued function y^(mu)=f(x^(mu))y^{\mu}=f\left(x^{\mu}\right) as shown in Fig. C.12.
C. 7 Differentiation on the manifold
Let's consider a m-dimensional manifold M\mathcal{M} and a function f:Mrarr Rf: \mathcal{M} \rightarrow R. We differentiate functions by varying them with respect to coordinates x^(mu)x^{\mu}. It might seem like defining differentiation on manifolds should just rely on identifying a function ff and a chart (U,phi)(U, \phi) to map onto R^(m)\mathbb{R}^{m}. This would allow us to vary f@phi^(-1)f \circ \phi^{-1} with respect to coordinates x^(mu)x^{\mu} giving partial derivatives like (del)/(delx^(nu))(f@phi^(-1))\frac{\partial}{\partial x^{\nu}}\left(f \circ \phi^{-1}\right). It is, unfortunately, not quite that simple. However, the fix we need provides the manifold with a rich structure that makes the extra effort involved in defining differentiation more than worth it.
First, the problem: it would seem reasonable that ff should be differentiable if f@phi^(-1)f \circ \phi^{-1} is differentiable, but this is not the case. If psi\psi is another homeomorphism such that psi:V rarrR^(m)\psi: V \rightarrow \mathbb{R}^{m} and U nn V!=0U \cap V \neq 0 then it's not necessarily the case that f@psi^(-1)f \circ \psi^{-1} is also differentiable. Since we can write that
we need the coordinate transformation phi@psi^(-1)\phi \circ \psi^{-1} to be differentiable in order that f@psi^(-1)f \circ \psi^{-1} is also differentiable.
Therefore, in order to be able to differentiate on a manifold with charts (U_(i),phi_(i))\left(U_{i}, \phi_{i}\right), we need that if a pair of regions U_(i)U_{i} and U_(j)U_{j} overlaps such that U_(i)nnU_(j)!=O/U_{i} \cap U_{j} \neq \varnothing, then the map
should be infinitely differentiable (a property denoted C^(oo)C^{\infty} ). ^(8){ }^{8} We call such homeomorphisms C^(oo)C^{\infty}-related. The idea of compatible charts allows us to construct a maximal atlas, which is the atlas that contains every compatible C^(oo)C^{\infty}-related chart. (This allows us to ensure that two equivalent spaces with different atlases aren't actually two different manifolds.) A differentiable manifold is then specified by the set M\mathcal{M} and its (unique, maximal) atlas of C^(oo)C^{\infty}-related charts {U}\{U\}. Defined in this way, the differentiable manifold carries a significant amount of structure.
Example C. 8
Let's consider the simplest possible differentiable manifold. Take a manifold N\mathcal{N} to be R\mathbb{R} and a homeomorphism eta:RrarrR\eta: \mathbb{R} \rightarrow \mathbb{R} to be the identity x|->xx \mapsto x (or, more simply, eta(x)=x)\eta(x)=x). The manifold N\mathcal{N} taken with the maximal atlas that contains the identity is a differentiable manifold. This is because ^(9){ }^{9} the identity eta(x)=x\eta(x)=x is indeed C^(oo)C^{\infty}.
Having fixed up differentiation on a manifold in terms of its homeomorphisms, we can characterize a differentiable function between manifolds ^(10){ }^{10} (referring to Fig. C. 12 again).
A function that maps between manifolds f:MrarrNf: \mathcal{M} \rightarrow \mathcal{N} is differentiable if for every coordinate system (phi,U)(\phi, U) in M\mathcal{M} and (psi,V)(\psi, V) in N\mathcal{N}, the map psi@f@phi^(-1):R^(m)rarrR^(n)\psi \circ f \circ \phi^{-1}: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n} is differentiable.
We now have a concept of a differentiable manifold and a differentiable map between manifolds. If we further insist that the map psi@f@phi^(-1)\psi \circ f \circ \phi^{-1} is invertible (i.e. that there exists a map phi@f^(-1)@psi^(-1)\phi \circ f^{-1} \circ \psi^{-1} ), and that both y=psi@f@phi^(-1)(x)y=\psi \circ f \circ \phi^{-1}(x) and x=phi@f^(-1)@psi^(-1)(y)x=\phi \circ f^{-1} \circ \psi^{-1}(y) are C^(oo)C^{\infty}, then M\mathcal{M} is said to be diffeomorphic to N\mathcal{N}, and the map ff is called a diffeomorphism. Because the map is invertible, the dimension of M\mathcal{M} has to equal that of N\mathcal{N}, i.e. m=nm=n. Two diffeomorphic manifolds can be regarded as essentially the same manifold. ^(11){ }^{11}
Example C. 9
The map f:RrarrRf: \mathbb{R} \rightarrow \mathbb{R} given by f(x)=xf(x)=x is a diffeomorphism. However, the map g:RrarrRg: \mathbb{R} \rightarrow \mathbb{R} given by g(x)=x^(2)g(x)=x^{2} is not a diffeomorphism because it is not 1-1 (e.g. g(2)=4g(2)=4, but also g(-2)=4g(-2)=4 ). The map h:RrarrRh: \mathbb{R} \rightarrow \mathbb{R} given by h(x)=x^(3)h(x)=x^{3} is not a diffeomorphism either. Although hh is a 1-1 map, its inverse h^(-1)(x)=x^(1//3)h^{-1}(x)=x^{1 / 3} is not sufficiently smooth (i.e. C^(oo)C^{\infty} ) at x=0x=0, since its first derivative is not defined. The map phi:R^(2)rarrR^(2)\phi: \mathbb{R}^{2} \rightarrow \mathbb{R}^{2} given by phi(x,y)=(x+(y)/(2),y-(x)/(2))\phi(x, y)=\left(x+\frac{y}{2}, y-\frac{x}{2}\right) is a diffeomorphism; the determinant of the Jacobian of the map is non-zero everywhere ^(12){ }^{12} and so the map is invertible. ^(8){ }^{8} If f(x_(1),dots,x_(n))f\left(x_{1}, \ldots, x_{n}\right) is a function defined on an open region SS of R^(n)\mathbb{R}^{n}, then it is differentiable of class C^(k)C^{k} if all of the partial derivatives or order less than or equal to kk exist and are continuous functions on SS. A special case is the C^(oo)C^{\infty} (or smooth) function: a map is C^(oo)C^{\infty} if the coordinates of a point in N\mathcal{N} are infinitely differentiable functions of the coordinates of the inverse image of the point M\mathcal{M}. All polynomial functions are C^(oo)C^{\infty}. By contrast, a function like x^((1)/(3))x^{\frac{1}{3}} has a first derivative that is not continuous at the origin (where it blows up), so it is not C^(oo)C^{\infty}. ^(9){ }^{9} The identity map x|->xx \mapsto x, i.e. eta(x)=\eta(x)=xx, is continuous, its first derivative is unity, so is continuous. Subsequent derivatives are zero, which is continuous too. ^(10){ }^{10} We assume manifold M\mathcal{M} is mm dimensional and manifold N\mathcal{N} is nn dimensional. ^(11){ }^{11} A diffeomorphism can only apply to manifolds. This is because of the local smoothness of a manifold that follows from it resembling R^(n)\mathbb{R}^{n} locally. In contrast, a homeomorphism can apply between things that aren't manifolds. ^(12){ }^{12} The Jacobian matrix of the map is the matrix of partial derivatives (delphi^(i)//delx^(j))\left(\partial \phi^{i} / \partial x^{j}\right), where in this case phi^(1)=\phi^{1}=x^(1)+(x^(2))/(2)x^{1}+\frac{x^{2}}{2} and phi^(2)=x^(2)-(x^(1))/(2)\phi^{2}=x^{2}-\frac{x^{1}}{2}. The determinant of this matrix is called the Jacobian, and in this case it is equal to (5)/(4)\frac{5}{4} which, crucially, is non-zero. ^(13){ }^{13} Beyond general relativity, diffeomorphisms are useful in mechanics where they allow an insightful geometrical description of Hamiltonian mechanics. description of Hamiltonian mechanics.
See Geroch's book Geometrical Quantum Mechanics for an introduction. ^(14){ }^{14} The very important Killing vector fields that tell us about conserved quantities are therefore also most generally given in terms of diffeomorphisms. ^(15){ }^{15} Recall that we had
Note that a homeomorphism is basically a diffeomorphism without the differentiability requirement. One way to think about it is that calling a map between two spaces a homeomorphism means that you can deform one space to the other continuously. Calling a map between two spaces a diffeomorphism tells you something extra; it means that it is possible to deform one space to the other smoothly, and that smoothness of the coordinate transformations is independent of the coordinates chosen.
Why are diffeomorphisms so essential for general relativity? A diffeomorphism phi:MrarrM\phi: \mathcal{M} \rightarrow \mathcal{M} maps a point to another point in the same manifold. Such diffeomorphisms are analogous to active coordinate transformations that transform from one point to another point. If the physics is unaffected by this transformation then this tells us about points that are the same (or indistinguishable). Diffeomorphisms therefore reveal the gauge symmetries of general relativity. ^(13){ }^{13} As a result, general relativity is sometimes called a diffeomorphism-invariant theory. Moreover, diffeomorphisms allow us to compare tensors defined at different points on a manifold and so the definition of the Lie derivative £_(u)£_{\boldsymbol{u}} (Chapter 33) is most generally given in terms of diffeomorphisms. ^(14){ }^{14} In this case, the flow along the integral curves is represented by a diffeomorphism, where the vector field u\boldsymbol{u} encoding this flow is referred to as the generator of the diffeomorphism. We use this in the next example.
Example C. 10
We can use invariance with respect to diffeomorphisms to justify one of the most fundamental equations in relativity: grad*T=0\boldsymbol{\nabla} \cdot \boldsymbol{T}=0. Let's use the machinery of Chapter 40 and examine the variation of the matter Lagrangian with the components of the metric
If the variations in the metric components are generated by diffeomorphisms, we have ^(15)deltag_(mu nu)=(£_(u)g)_(mu nu)=2u_((mu;nu)){ }^{15} \delta g_{\mu \nu}=\left(£_{\boldsymbol{u}} \boldsymbol{g}\right)_{\mu \nu}=2 u_{(\mu ; \nu)}. For a diffeomorphism-invariant theory we have delta S=0\delta S=0 and so, as a result, we write
where we drop the symmetrization of the covariant derivative of u\boldsymbol{u}, as delta(sqrt(-g)L_(m))//deltag_(mu nu)\delta\left(\sqrt{-g} \mathcal{L}_{\mathrm{m}}\right) / \delta g_{\mu \nu} is symmetric, and so the integral is unaffected by the presence of the symmetrization. Integrating by parts, we find
This must be true for diffeomorphisms generated by an arbitrary field u\boldsymbol{u} and so, using the definition of T\boldsymbol{T} in terms of the action from Chapter 40, i.e.
we can see that the integrand is equivalent to (grad_(mu)T)^(mu nu)=T_(;mu)^(mu nu)=0\left(\boldsymbol{\nabla}_{\mu} \boldsymbol{T}\right)^{\mu \nu}=T_{; \mu}^{\mu \nu}=0 or grad*T=0\boldsymbol{\nabla} \cdot \boldsymbol{T}=0.
C. 8 Compact regions
We have stated that a closed interval between points aa and bb on the real line R\mathbb{R} is written [a,b][a, b]. It has the nice property that it encompasses a finite interval, doesn't have any holes in it, and includes its boundary. This notion can be generalized for a region of a manifold and the general term we will use is compact. If we say a region is compact, we mean that it doesn't do things like (i) go off to infinity; (ii) have bits removed; nor (iii) have bits of its boundary removed. One of the ways of achieving this is to insist that any sequence of points in our region must have a limit (or accumulation point) that also lies in the region. ^(16){ }^{16}
Example C. 11
A closed interval [0,1][0,1] on R\mathbb{R} is compact, as can be seen by considering the sequence of points defined by 1//n1 / n where n > 0n>0 is a positive integer, each of which lies within that interval, but crucially so does its limit (the point reached when n rarr oon \rightarrow \infty, which is 0 and is a member of the set of points defined by [0,1][0,1] ). This would not work if we removed the point 0 on the boundary by considering ( 0,1 ] instead of [0,1][0,1]. We conclude that [0,1][0,1] is compact, but (0,1](0,1] is not compact.
We have defined compactness as an 'upgrade' of the notion of a closed interval [a,b][a, b] on R\mathbb{R}, so it is not surprising that such a closed interval is compact, but this statement is termed the Heine-Borel theorem. ^(17){ }^{17} In fact, it is also possible to show that a subset of real numbers is compact if and only if it is closed and bounded. ^(18){ }^{18} A compact space can therefore be thought of as a space which, if it has a boundary, includes the boundary as part of the space, and it has no missing parts. Some examples of spaces that are and aren't compact are given below.
Example C. 12
The closed unit disc is compact, as is the sphere S^(2)S^{2} and the torus T^(2)T^{2}.
The Euclidean plane is not compact (it contains points that run off to infinity). Neither is the open unit disc (points on its boundary are not included in the space). Nor is the closed disc with a hole in it (it has a missing region).
C. 9 Curves
A curve on a manifold can be described using a parametrization, with a real number lambda\lambda telling us how far we are along the curve. Thus, if we take the map c:[a,b]rarrMc:[a, b] \rightarrow \mathcal{M} as shown, the real number lambda\lambda on the real line R\mathbb{R} is mapped on to the point c(lambda)inMc(\lambda) \in \mathcal{M}, producing a curve on M\mathcal{M} as lambda\lambda runs from aa to bb. As usual, the homeomorphism phi\phi maps from ^(16){ }^{16} An alternative, and rather grand and formal, way of defining a compact region is to say that a region A\mathcal{A} is compact if every open cover OO contains a finite sub-collection of open sets which also cover A\mathcal{A}. ^(17){ }^{17} We state this theorem without proof here, but see e.g. Spivak's Calculus on Manifolds for the full story on this and other theorems about topological spaces. The theorem is named in honour of the German mathematician Eduard Heine (1821-1881) and the French mathematician Émile Borel (1871-1956). ^(18){ }^{18} The term bounded means that the set doesn't run off to infinity but is enclosed within some finite region; to define a region of a manifold as bounded requires a notion of distance, i.e. it applies only to spaces endowed with a metric.
Fig. C. 13 A curve c(lambda)c(\lambda), expressed in R^(m)\mathbb{R}^{m} by a homeomorphism phi\phi. ^(19){ }^{19} This is the idea of a fibre bundle, examined in more detail in the next section.
Fig. C. 14 A vector field in terms of mappings. M\mathcal{M} to a chart in R^(m)\mathbb{R}^{m}. Therefore, the coordinate on the curve on M\mathcal{M} corresponding to some value of lambda\lambda is given by the composite map
In short, input at parameter lambda\lambda and output a point on the curve (x^(1),dotsx^(m))=phi@c(lambda)\left(x^{1}, \ldots x^{m}\right)=\phi \circ c(\lambda) in R^(m)\mathbb{R}^{m} (see Fig. C.13).
C. 10 Tangent spaces
Next, we build up to the notion of a vector. An arrow does not properly represent a vector on a manifold. There's no origin or concept of straightness, after all. The vectors we shall discuss are tangent vectors, that is, directional derivatives to curves that live in a manifold. A tangent vector does not live in the same manifold in which curves live, but instead lives in a manifold called a tangent space. There is not one tangent space but many: one, in fact, for each point on the manifold. The tangent spaces can be thought of as floating above the manifold. ^(19){ }^{19}
We shall generalize the procedure of finding a tangent vector as the directional derivative along a curve by declaring:
The tangent vector at c(lambda=0)c(\lambda=0) is defined as the directional derivative of a function f(c(lambda))f(c(\lambda)) along the curve c(lambda)c(\lambda), evaluated at lambda=0\lambda=0.
A curve c(lambda)c(\lambda) takes the parameter lambda\lambda from R\mathbb{R} and puts it on the manifold. To take it from the manifold back to R\mathbb{R} we need a function f:MrarrRf: \mathcal{M} \rightarrow \mathbb{R}. Once we have this, we can evaluate the rate of change of the curve with the parameter lambda\lambda as follows: ((" Rate of change of "f(c(lambda))" along ")/(" the curve, evaluated at "lambda=0))=(df(c(lambda)))/(dlambda)|_(lambda=0)=(d(f@c))/(dlambda)|_(lambda=0)\binom{\text { Rate of change of } f(c(\lambda)) \text { along }}{\text { the curve, evaluated at } \lambda=0}=\left.\frac{\mathrm{d} f(c(\lambda))}{\mathrm{d} \lambda}\right|_{\lambda=0}=\left.\frac{\mathrm{d}(f \circ c)}{\mathrm{d} \lambda}\right|_{\lambda=0},
where, in the last part, we've written out the composition.
Using the homeomorphism phi:MrarrR^(m)\phi: \mathcal{M} \rightarrow \mathbb{R}^{m}, we can map the point on the manifold into R^(m)\mathbb{R}^{m}, as shown in Fig. C.14, and then the combination f@cf \circ c can be written as
where the first bracket (f@phi^(-1))\left(f \circ \phi^{-1}\right) is a real-valued function of a point in R^(m)\mathbb{R}^{m} [that is {:f@phi^(-1)=f(x^(mu))]\left.f \circ \phi^{-1}=f\left(x^{\mu}\right)\right] and the second bracket (phi@c)(\phi \circ c) takes a point lambda\lambda from R\mathbb{R} and returns a point in R^(m)\mathbb{R}^{m} [that is, it maps out the curve in coordinate space and could therefore be written as {:x^(mu)(c(lambda))]\left.x^{\mu}(c(\lambda))\right]. Since f@phi^(-1)f \circ \phi^{-1} and phi@c\phi \circ c are coordinate representations of the function and curve respectively, we can write the derivative in terms of a Leibniz (or chain) rule
It is this expression that we use to define a vector v[f]\boldsymbol{v}[f]. It features the differential operator (del)/(delx^(mu))\frac{\partial}{\partial x^{\mu}}, which supplies the basis vectors, acting on a function ff. It comes with a factor (dx^(mu)(c(lambda)))/(dlambda)|_(lambda=0)\left.\frac{\mathrm{d} x^{\mu}(c(\lambda))}{\mathrm{d} \lambda}\right|_{\lambda=0}, that we call the mu\mu th component and which tells us about the rate of change of the curve cc with lambda\lambda, when it is projected into R^(m)\mathbb{R}^{m}. As a result, the tangent vector v\boldsymbol{v} is defined as
where the order of the terms on the right-hand side has been swapped to conform to our usual convention of writing v=v^(mu)e_(mu)\boldsymbol{v}=v^{\mu} \boldsymbol{e}_{\mu}.
Example C. 13
The simplest example results if we let the function ff be the coordinate function x^(nu)=phi^(nu)@c(lambda)x^{\nu}=\phi^{\nu} \circ c(\lambda) or
thereby measuring the rate of change of the coordinate component x^(nu)x^{\nu} with lambda\lambda.
We can generalize and say that a tangent vector works on a number of functions (e.g. ff and gg ) taken from the set F=C^(oo)(M)\mathcal{F}=C^{\infty}(\mathcal{M}) of all smooth functions from M\mathcal{M} to R\mathbb{R} (or f,g inFf, g \in \mathcal{F} ). We may then firm up the definition a little at this point and say that a tangent vector v\boldsymbol{v} at a point PinM\mathcal{P} \in \mathcal{M} is a map v:Frarr R\boldsymbol{v}: \mathcal{F} \rightarrow R, which is (i) linear and (ii) obeys the Leibniz rule, or
where alpha\alpha and beta\beta are arbitrary numbers and the functions are evaluated at the point P\mathcal{P}. In this way of viewing tangent vectors, there is a one-to-one correspondence between vectors and derivatives. It is consistent, therefore, to adopt the view that instead of vectors, we can work with derivatives.
There is a class of curves that all pass through P\mathcal{P}. If they all have the same tangent vector, then we can identify them. If we have
then these give the same vector component at P\mathcal{P}. We identify the tangent vector v\boldsymbol{v} with the equivalence class of curves, rather than a single curve.
Lots of curves passing through P\mathcal{P} will have different tangent vectors. All of the tangent vectors at P\mathcal{P}, one for each class of curves, form a vector space called a tangent space of the manifold M\mathcal{M} at point P\mathcal{P}, denoted T_(P)M\mathcal{T}_{\mathcal{P}} \mathcal{M}. This is shown in Fig. C.15. We examine the generalization of the tangent space in the next section.
(a)
Fig. C. 15 The tangent space contains all of the tangent vectors. (a) A set of vectors in the tangent space at a point P\mathcal{P}. (b) The point of view used earlier in the book of a tangent plane to a surface relies on embedding the manifold in Euclidean space.
Fig. C. 16 (a) The one-dimensional manifold with some tangent vectors identified at each point. (b) The tangent vectors represented as vertical lines. These are the fibres we use to form a bundle.
Fig. C. 17 A fibre bundle B\mathcal{B} consisting of base space M\mathcal{M} and fibres V\mathcal{V}. The projection pi\pi collapses a fibre down to a point.
Fig. C. 18 A fibre bundle and its projection pi\pi.
C. 11 Fibre bundles
Consider a curve, which is itself a one-dimensional manifold M\mathcal{M}. Take tangents at each point along the curve [some examples are shown in Fig. C.16(a)]. Drawn in this way, neighbouring tangent vectors intersect each other creating a certain amount of confusion. To avoid this, we could instead draw them as in Fig. C.16(b), rotating them around so they no longer keep bumping into each other. Of course, in this new drawing they no longer so obviously resemble tangent vectors, but let's imagine that we could somehow still encode that information in them. Now lift the tangent vectors for each point off the line so that we have the state of affairs shown in Fig. C.17. This shows the one-dimensional manifold M\mathcal{M}. Above each and every point in M\mathcal{M} there is one (one-dimensional) manifold V\mathcal{V} floating above it. The particular tangent vector at a point in M\mathcal{M} is represented by a point on the particular manifold V\mathcal{V} (like beads on an abacus). We call the manifold V\mathcal{V} a fibre. By combining the manifold M\mathcal{M} (the points) and all of the V\mathcal{V} s (the fibres or tangent spaces) we obtain a new, two-dimensional manifold VM\mathcal{V} \mathcal{M} known as a fibre bundle B\mathcal{B} or, in this special case of tangent spaces, a tangent bundle TM\mathcal{T} \mathcal{M}.
More generally, then, a fibre bundle B\mathcal{B} is a manifold defined in terms of two other manifolds M\mathcal{M} and V\mathcal{V}. Manifold M\mathcal{M} is called the base space and V\mathcal{V} is called the fibre. The dimension of the fibre bundle B\mathcal{B} is always the sum of the dimensions of M\mathcal{M} and V\mathcal{V}. There are many copies of V\mathcal{V} in B\mathcal{B}. One complete copy of V\mathcal{V} stands above each point in M\mathcal{M}.
We can undo the previous construction if we define a continuous map from B\mathcal{B} back down on to M\mathcal{M}. This is called the canonical projection pi\pi from B\mathcal{B} to M\mathcal{M}. This map collapses each fibre down to the corresponding point in M\mathcal{M}. If you like, it forgets about the information stored in the fibre, remembering only the point on M\mathcal{M} which the fibre was floating above.
The simplest example of a bundle is a product space of M\mathcal{M} with V\mathcal{V}, which is written as MxxV\mathcal{M} \times \mathcal{V}. The points in MxxV\mathcal{M} \times \mathcal{V} are pairs of elements ( a,ba, b ) where aa belongs to M\mathcal{M} and bb belongs to V\mathcal{V} (see Fig. C.18). This is often called a trivial bundle.
Example C. 14
Perhaps the simplest of all tangent bundles is formed by making the base manifold the unit circle. We shall consider TS^(1)\mathcal{T} S^{1}, the tangent bundle of the circle S^(1)S^{1} and its tangent vectors. The tangent bundle TS^(1)\mathcal{T} S^{1} is identical to the product space S^(1)xxRS^{1} \times \mathbb{R}, shown by the cylinder in Fig. C. 19 (and so is a trivial bundle). Moreover, TS^(1)\mathcal{T} S^{1} is a two-dimensional manifold which we can cover with coordinates. In S^(1)S^{1} a point is described by a coordinate theta\theta. A tangent vector v\boldsymbol{v} at any point P\mathcal{P} can be written as v=y(del)/(del theta)-=ye_(theta)\boldsymbol{v}=y \frac{\partial}{\partial \theta} \equiv y \boldsymbol{e}_{\theta}, where yy is a coordinate in T_(theta)\mathcal{T}_{\theta} (i.e. an amplitude taken from the tangent space floating above the particular angle theta\theta ). The coordinates (theta,y)(\theta, y) then tell us about the position on the base space and the coordinate along the fibre.
Bundles are said to be locally trivial if they are formed from a product space. We can ask whether they're also globally trivial: whether the
whole bundle can be represented by a product MxxV\mathcal{M} \times \mathcal{V}. The example in Fig. C.19, where the bundle resembles a cylinder, is indeed globally trivial. An interesting counterexample of a bundle that is not globally trivial is a twisted bundle. This resembles MxxV\mathcal{M} \times \mathcal{V} locally, but as we move around M\mathcal{M}, the fibres twist, so that globally B\mathcal{B} is different from MxxV\mathcal{M} \times \mathcal{V}.
Example C. 15
Once again, take M\mathcal{M} to be the circle S^(1)S^{1} and V\mathcal{V} to be the real line R\mathbb{R}. The trivial bundle simply resembles a two-dimensional cylinder. Now construct a twisted bundle, by forming the fibres into a Möbius strip ^(20){ }^{20} (see Fig. C.20). Locally, this is the same as the cylinder; globally it is not. To see the local similarity, remove a point P\mathcal{P} from the base space. We then have a segment S^(1)-PS^{1}-\mathcal{P} and the bundle above this segment can be deformed to look the same as the cylinder. It is only when we look at the whole of the base space that we notice the difference. To see this, consider two segments: S^(1)-PS^{1}-\mathcal{P} and S^(1)-QS^{1}-\mathcal{Q}, where P\mathcal{P} and Q\mathcal{Q} are different points. Each is locally trivial (i.e. each can be deformed to look like the cylinder). However, on gluing them together (with a twist) to make a whole we form the Möbius strip.
Formally, we characterize a bundle by looking at its cross section. The cross section of the bundle B\mathcal{B} is a continuous image of the base space M\mathcal{M} in B\mathcal{B}, which meets each fibre at a single point (Fig. C.21). This is called the lift of the base space into the bundle. So if we apply the lift via a continuous function s:MrarrBs: \mathcal{M} \rightarrow \mathcal{B}, followed by the projection pi:BrarrM\pi: \mathcal{B} \rightarrow \mathcal{M}, we get the identity map from M\mathcal{M} into itself
For the trivial bundle MxxV\mathcal{M} \times \mathcal{V} the cross sections look like continuous functions on M\mathcal{M} which take values in the space V\mathcal{V}. So a cross section of MxxV\mathcal{M} \times \mathcal{V} assigns in a continuous way, a point of V\mathcal{V} to each point on M\mathcal{M}. This is like an extension of the ordinary idea of a graph of a function.
Example C. 16
Consider the cylindrical product bundle MxxV\mathcal{M} \times \mathcal{V}. The cross section looks like a curve intersecting each fibre once as it goes around the cylinder. We can also mark out the curve featuring at the zeros of the vectors, called the zero section. There's no guarantee that any old curve intersects this line of zeros. The Möbius bundle is more complicated, but one thing that can be said is that a curve has to cross the zero section. This gives us a way of characterizing the difference between the Möbius strip and trivial bundle.
Fig. C. 19 The bundle TS^(1)\mathcal{T} S^{1} floating above its base space. ^(20){ }^{20} August Ferdinand Möbius (17901868). The Möbius strip (a twodimensional surface that, when embedded in three dimensions, has only one side) was discovered by Möbius and, independently, by Johann Benedict Listing.
(a)
(b)
Fig. C. 20 (a) The globally trivial bundle. (b) The bundle with a twist, forming a Möbius strip.
Fig. C. 21 The cross section of the bundle B\mathcal{B}, formed from the lift ss of the base space M\mathcal{M}. This can be thought of as a way of graphing a function on M\mathcal{M} in B\mathcal{B}.
Chapter summary
The set M\mathcal{M} is a manifold if each point in M\mathcal{M} has an open neighbourhood which has a continuous 1-1 map onto an open set of R^(n)\mathbb{R}^{n} for some nn.
Morphisms allow us to map between manifolds. A diffeomorphism relates two manifolds which are endowed with a differentiable structure (meaning that they are smooth) and is the most useful morphism in general relativity. A diffeomorphism that maps a manifold onto itself is equivalent to an active coordinate transformation.
Compact regions don't go off to infinity, or have parts removed or on a boundary.
Vectors can be defined in terms of derivatives using curves on the manifold and mappings using the expression
A fibre bundle B\mathcal{B} is the combination of a base space M\mathcal{M} and a fibre space V\mathcal{V}, with a fibre defined at each point in the base space.